Sometimes you have to work with HTML which is not structured in semantically meaningful ways. Here’s a quick code snippet for finding the HTML between two DIVs using PHP’s DOMDocument and DOMXPath.
Suppose you have the following HTML structure:
<html>
<div id="some-div">...</div>
<div>...</div>
<div>...</div>
<div>...</div>
<div id="some-other-div">...</div>
</html>
Let’s say you want to grab the three DIVs between the ID “some-div” and “some-other-div”. Let’s also assume that your HTML is available in a variable named $html. Here’s how you’d get the HTML between those two DIVs.
$dom = new DOMDocument();
@$dom->loadHtml($html);
$xpath = new DOMXPath($dom);
$snippet = '';
// Find the DIV with ID "some-div".
$node = $xpath->query('//div[@id="some-div"]')->item(0);
// Loop through each sibling node.
while ($node = $node->nextSibling) {
// Skip stuff like "#text" elements which cause problems.
if (get_class($node) != 'DOMElement') {
continue;
}
// If we get to the last DIV, stop.
if ($node->getAttribute('id') == 'some-other-div') {
break;
}
// Grab HTML of this element.
$snippet .= $dom->saveXML($node);
}
That’s it!