How to get tagname of a TEXT_NODE in java's org.w3c.dom.Node

Lucas Ou-Yang picture Lucas Ou-Yang · Jul 31, 2013 · Viewed 8.4k times · Source

In the documentation for this interface it states that textnodes all return "#text" for their names instead of the actual tag name. But for what i'm doing, the tag name is necessary.

// I'm using the following imports
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;

import org.w3c.dom.Document;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.EntityResolver;
import org.xml.sax.InputSource;


// In the .xml input file
<country>US</country>  // This is a "text node" .getTextContent()
                       // returns "US", I need "country" and .getNodeName() 
                       // only returns "#text"

How could I access the tag name? This must be possible somehow, I don't mind a hackish solution.

Docs:

http://www.w3schools.com/dom/dom_nodetype.asp

http://www.w3.org/2003/01/dom2-javadoc/org/w3c/dom/Node.html

Thank you.

Answer

Jon Skeet picture Jon Skeet · Jul 31, 2013

I think you've misunderstood what nodes are involved. This XML:

<country>US</country>

... contains two nodes:

  • The country element
  • The text node, with content of US

The element is not a text node, and the text node doesn't have an element name, because it's not an element. It's important to understand that these are different nodes. That's the source of all your confusion, I believe.

If you're currently looking at the text node, you could use node.getParentNode().getNodeName() to get the element name. Or from the element node, you could call getTextContent().