XPath query with descendant and descendant text() predicates

juan234 picture juan234 · Oct 13, 2010 · Viewed 35.2k times · Source

I would like to construct an XPath query that will return a "div" or "table" element, so long as it has a descendant containing the text "abc". The one caveat is that it can not have any div or table descendants.

<div>
  <table>
    <form>
      <div>
        <span>
          <p>abcdefg</p>
        </span>
      </div>
      <table>
        <span>
          <p>123456</p>
        </span>
      </table>
    </form>
  </table>
</div>

So the only correct result of this query would be:

/div/table/form/div 

My best attempt looks something like this:

//div[contains(//text(), "abc") and not(descendant::div or descendant::table)] | //table[contains(//text(), "abc") and not(descendant::div or descendant::table)]

but does not return the correct result.

Thanks for your help.

Answer

Dimitre Novatchev picture Dimitre Novatchev · Oct 13, 2010

Something different: :)

//text()[contains(.,'abc')]/ancestor::*[self::div or self::table][1]

Seems a lot shorter than the other solutions, doesn't it? :)

Translated to simple English: For any text node in the document that contains the string "abc" select its first ancestor that is either a div or a table.

This is more efficient, as only one full scan of the document tree (and not any other) is required, and the ancestor::* traversal is very cheap compared to a descendent:: (tree) scan.

To verify that this solution "really works":

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="/">
  <xsl:copy-of select=
  "//text()[contains(.,'abc')]/ancestor::*[self::div or self::table][1] "/>
 </xsl:template>
</xsl:stylesheet>

when this transformation is performed on the provided XML document:

<div>
  <table>
    <form>
      <div>
        <span>
          <p>abcdefg</p>
        </span>
      </div>
      <table>
        <span>
          <p>123456</p>
        </span>
      </table>
    </form>
  </table>
</div>

the wanted, correct result is produced:

<div>
   <span>
      <p>abcdefg</p>
   </span>
</div>

Note: It isn't necessary to use XSLT -- any XPath 1.0 host -- such as DOM, must obtain the same result.