How does one retrieve the text in a node without selecting the text in the children?
<div id="comment">
<div class="title">Editor's Description</div>
<div class="changed">Last updated: </div>
<br class="clear">
Lorem ipsum dolor sit amet.
</div>
In other words, I want Lorem ipsum dolor sit amet.
rather than Editor's DescriptionLast updated: Lorem ipsum dolor sit amet.
In the provided XML document:
<div id="comment">
<div class="title">Editor's Description</div>
<div class="changed">Last updated: </div>
<br class="clear">
Lorem ipsum dolor sit amet.
</div>
the top element /div
has 4 children nodes that are text nodes. The first three of these four text-node
children are whitespace-only
. The last of these 4 text-node
children is the one that is wanted.
Use:
/div/text()[last()]
This is different from:
/div/text()
The latter may (depending on whether whitespace-only
nodes are preserved by the XML parser) select all 4 text nodes, but you only want the last of them.
An alternative is (when you don't know exactly which text-node
you want):
/div/text()[normalize-space()]
This selects all text-node-children
of /div
that are not whitespace-only
text nodes.