How to query XML using namespaces in Java with XPath?

Inez picture Inez · Jun 17, 2011 · Viewed 91.1k times · Source

When my XML looks like this (no xmlns) then I can easly query it with XPath like /workbook/sheets/sheet[1]

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<workbook>
  <sheets>
    <sheet name="Sheet1" sheetId="1" r:id="rId1"/>
  </sheets>
</workbook>

But when it looks like this then I can't

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<workbook xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
  <sheets>
    <sheet name="Sheet1" sheetId="1" r:id="rId1"/>
  </sheets>
</workbook>

Any ideas?

Answer

Mads Hansen picture Mads Hansen · Jun 18, 2011

In the second example XML file the elements are bound to a namespace. Your XPath is attempting to address elements that are bound to the default "no namespace" namespace, so they don't match.

The preferred method is to register the namespace with a namespace-prefix. It makes your XPath much easier to develop, read, and maintain.

However, it is not mandatory that you register the namespace and use the namespace-prefix in your XPath.

You can formulate an XPath expression that uses a generic match for an element and a predicate filter that restricts the match for the desired local-name() and the namespace-uri(). For example:

/*[local-name()='workbook'
    and namespace-uri()='http://schemas.openxmlformats.org/spreadsheetml/2006/main']
  /*[local-name()='sheets'
      and namespace-uri()='http://schemas.openxmlformats.org/spreadsheetml/2006/main']
  /*[local-name()='sheet'
      and namespace-uri()='http://schemas.openxmlformats.org/spreadsheetml/2006/main'][1]

As you can see, it produces an extremely long and verbose XPath statement that is very difficult to read (and maintain).

You could also just match on the local-name() of the element and ignore the namespace. For example:

/*[local-name()='workbook']/*[local-name()='sheets']/*[local-name()='sheet'][1]

However, you run the risk of matching the wrong elements. If your XML has mixed vocabularies (which may not be an issue for this instance) that use the same local-name(), your XPath could match on the wrong elements and select the wrong content: