How to find only nodes with at least a similar/equal sibling node using Xpath?
For example:
<root>
<parent>
<node>...</node>
<node_unique>...</node_unique>
<node>...</node>
<another_one>...</another_one>
<another_one>...</another_one>
</parent>
</root>
In the example the xpath shold select only <node>
and <another_one>
because they are appearing more than once.
I was trying to find a solution for this for hours without success (now I think is not possible with XPath...).
These are impossible to select with a single XPath 1.0 expression (due to lack of range variables in XPath 1.0).
One possible solution is to select all /*/*/*
elements, then to get the name of each element, using name()
off that element, then to evaluate /*/*/*[name() = $currentName][2]
(where $currentName
should be substituted with the name just obtained. If the last expression selects an element, then the currentName
is a name that occurs at least twice -- therefore you keep that element. Do so with all elements and their names. As an auxhiliarry step, one might dedup the names (and selected elements) by placing them in a hash-table.
In Xpath 2.0 it is trivial to select with a single XPath expression all children of a given parent, that have at least one other sibling with the same name:
/*/*/*
[name() = following-sibling::*/name()
and
not(name() = preceding-sibling::*/name())
]
A much more compact expression:
/*/*/*[index-of(/*/*/*/name(), name())[2]]
XSLT 2.0 - based verification:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:copy-of select=
"/*/*/*[index-of(/*/*/*/name(), name())[2]]"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document:
<root>
<parent>
<node>...</node>
<node_unique>...</node_unique>
<node>...</node>
<another_one>...</another_one>
<another_one>...</another_one>
</parent>
</root>
the above XPath expression is evaluated and the selected from this evaluation elements are copied to the output:
<node>...</node>
<another_one>...</another_one>
Note: For a related question/answer, see this.