Find all namespace declarations in an XML document - xPath 1.0 vs xPath 2.0

james.garriss picture james.garriss · Apr 18, 2012 · Viewed 8.5k times · Source

As part of a Java 6 application, I want to find all namespace declarations in an XML document, including any duplicates.

Edit: Per Martin's request, here's the Java code I am using:

XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xPath = xPathFactory.newXPath();
XPathExpression xPathExpression = xPathExpression = xPath.compile("//namespace::*"); 
NodeList nodeList = (NodeList) xPathExpression.evaluate(xmlDomDocument, XPathConstants.NODESET);

Suppose I have this XML document:

<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:ele="element.com" xmlns:att="attribute.com" xmlns:txt="textnode.com">
    <ele:one>a</ele:one>
    <two att:c="d">e</two>
    <three>txt:f</three>
</root>

To find all namespace declarations, I applied this xPath statement to the XML document using xPath 1.0:

//namespace::*

It finds 4 namespace declarations, which is what I expect (and desire):

/root[1]/@xmlns:att - attribute.com
/root[1]/@xmlns:ele - element.com
/root[1]/@xmlns:txt - textnode.com
/root[1]/@xmlns:xml - http://www.w3.org/XML/1998/namespace

But if I change to using xPath 2.0, then I get 16 namespace declarations (each of the previous declarations 4 times), which is not what I expect (or desire):

/root[1]/@xmlns:xml - http://www.w3.org/XML/1998/namespace
/root[1]/@xmlns:att - attribute.com
/root[1]/@xmlns:ele - element.com
/root[1]/@xmlns:txt - textnode.com
/root[1]/@xmlns:xml - http://www.w3.org/XML/1998/namespace
/root[1]/@xmlns:att - attribute.com
/root[1]/@xmlns:ele - element.com
/root[1]/@xmlns:txt - textnode.com
/root[1]/@xmlns:xml - http://www.w3.org/XML/1998/namespace
/root[1]/@xmlns:att - attribute.com
/root[1]/@xmlns:ele - element.com
/root[1]/@xmlns:txt - textnode.com
/root[1]/@xmlns:xml - http://www.w3.org/XML/1998/namespace
/root[1]/@xmlns:att - attribute.com
/root[1]/@xmlns:ele - element.com
/root[1]/@xmlns:txt - textnode.com

This same difference is seen even when I use the non-abbreviated version of the xPath statement:

/descendant-or-self::node()/namespace::*

And it is seen across a variety of XML parsers (LIBXML, MSXML.NET, Saxon) as tested in oXygen. (Edit: As I mention later in the comments, this statement is not true. Though I thought I was testing a variety of XML parsers, I really wasn't.)

Question #1: Why the difference from xPath 1.0 to xPath 2.0?

Question #2: Is it possible/reasonable to get desired results using xPath 2.0?

Hint: Using the distinct-values() function in xPath 2.0 will not return the desired results, as I want all namespace declarations, even if the same namespace is declared twice. For example, consider this XML document:

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <bar:one xmlns:bar="http://www.bar.com">alpha</bar:one>
    <bar:two xmlns:bar="http://www.bar.com">bravo</bar:two>
</root>

The desired result is:

/root[1]/@xmlns:xml - http://www.w3.org/XML/1998/namespace
/root[1]/bar:one[1]/@xmlns:bar - http://www.bar.com
/root[1]/bar:two[1]/@xmlns:bar - http://www.bar.com

Answer

Roger Costello picture Roger Costello · May 2, 2012

I think this will get all namespaces, without any duplicates:

for $i in 1 to count(//namespace::*) return 
if (empty(index-of((//namespace::*)[position() = (1 to ($i - 1))][name() = name((//namespace::*)[$i])], (//namespace::*)[$i]))) 
then (//namespace::*)[$i] 
else ()