How get first level of dom elements by Domdocument PHP?
Example with code that not works - tooken from Q&A:How to get nodes in first level using PHP DOMDocument?
<?php
$str=<<< EOD
<div id="header">
</div>
<div id="content">
<div id="sidebar">
</div>
<div id="info">
</div>
</div>
<div id="footer">
</div>
EOD;
$doc = new DOMDocument();
$doc->loadHTML($str);
$xpath = new DOMXpath($doc);
$entries = $xpath->query("/");
foreach ($entries as $entry) {
var_dump($entry->firstChild->nodeValue);
}
?>
The first level of elements below the root node can be accessed with
$dom->documentElement->childNodes
The childNodes property contains a DOMNodeList
, which you can iterate with foreach
.
See DOMDocument::documentElement
This is a convenience attribute that allows direct access to the child node that is the document element of the document.
A DOMNodeList that contains all children of this node. If there are no children, this is an empty DOMNodeList.
Since childNodes
is a property of DOMNode
any class extending DOMNode
(which is most of the classes in DOM) have this property, so to get the first level of elements below a DOMElement
is to access that DOMElement's childNode property.
Note that if you use DOMDocument::loadHTML()
on invalid HTML or partial documents, the HTML parser module will add an HTML skeleton with html and body tags, so in the DOM tree, the HTML in your example will be
<!DOCTYPE html … ">
<html><body><div id="header">
</div>
<div id="content">
<div id="sidebar">
</div>
<div id="info">
</div>
</div>
<div id="footer">
</div></body></html>
which you have to take into account when traversing or using XPath. Consequently, using
$dom = new DOMDocument;
$dom->loadHTML($str);
foreach ($dom->documentElement->childNodes as $node) {
echo $node->nodeName; // body
}
will only iterate the <body>
DOMElement node. Knowing that libxml will add the skeleton, you will have to iterate over the childNodes of the <body>
element to get the div elements from your example code, e.g.
$dom->getElementsByTagName('body')->item(0)->childNodes
However, doing so will also take into account any whitespace nodes, so you either have to make sure to set preserveWhiteSpace
to false or query for the right element nodeType if you only want to get DOMElement
nodes, e.g.
foreach ($dom->getElementsByTagName('body')->item(0)->childNodes as $node) {
if ($node->nodeType === XML_ELEMENT_NODE) {
echo $node->nodeName;
}
}
or use XPath
$dom->loadHTML($str);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('/html/body/*') as $node) {
echo $node->nodeName;
}
Additional information: