Find duplicate sibling with xpath

Enrique picture Enrique · Sep 23, 2012 · Viewed 8.2k times · Source

How to find only nodes with at least a similar/equal sibling node using Xpath?

For example:

<root>
  <parent>
    <node>...</node>
    <node_unique>...</node_unique>
    <node>...</node>
    <another_one>...</another_one>
    <another_one>...</another_one>
  </parent>
</root>

In the example the xpath shold select only <node> and <another_one> because they are appearing more than once.

I was trying to find a solution for this for hours without success (now I think is not possible with XPath...).

Answer

Dimitre Novatchev picture Dimitre Novatchev · Sep 23, 2012

These are impossible to select with a single XPath 1.0 expression (due to lack of range variables in XPath 1.0).

One possible solution is to select all /*/*/* elements, then to get the name of each element, using name() off that element, then to evaluate /*/*/*[name() = $currentName][2] (where $currentName should be substituted with the name just obtained. If the last expression selects an element, then the currentName is a name that occurs at least twice -- therefore you keep that element. Do so with all elements and their names. As an auxhiliarry step, one might dedup the names (and selected elements) by placing them in a hash-table.

In Xpath 2.0 it is trivial to select with a single XPath expression all children of a given parent, that have at least one other sibling with the same name:

/*/*/*
   [name() = following-sibling::*/name()
  and
    not(name() = preceding-sibling::*/name())
   ]

A much more compact expression:

/*/*/*[index-of(/*/*/*/name(), name())[2]]

XSLT 2.0 - based verification:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
  <xsl:copy-of select=
  "/*/*/*[index-of(/*/*/*/name(), name())[2]]"/>
 </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the provided XML document:

<root>
  <parent>
    <node>...</node>
    <node_unique>...</node_unique>
    <node>...</node>
    <another_one>...</another_one>
    <another_one>...</another_one>
  </parent>
</root>

the above XPath expression is evaluated and the selected from this evaluation elements are copied to the output:

<node>...</node>
<another_one>...</another_one>

Note: For a related question/answer, see this.