Clean namespace handling with dom4j

Antoine Claval picture Antoine Claval · Sep 14, 2009 · Viewed 14.6k times · Source

We are using dom4j 1.6.1, to parse XML comming from somewhere. Sometime, the balise have mention of the namespace ( eg : ) and sometime not ( ). And it's make call of Element.selectSingleNode(String s ) fails.

For now we have 3 solutions, and we are not happy with them

1 - Remove all namespace occurence before doing anything with the xml document

xml = xml .replaceAll("xmlns=\"[^\"]*\"","");
xml = xml .replaceAll("ds:","");
xml = xml .replaceAll("etm:","");
[...] // and so on for each kind of namespace

2 - Remove namespace just before getting a node By calling

Element.remove(Namespace ns)

But it's works only for a node and the first level of child

3 - Clutter the code by

node = rootElement.selectSingleNode(NameWithoutNameSpace)
if ( node == null )
    node = rootElement.selectSingleNode(NameWithNameSpace)

So ... what do you think ? Witch one is the less worse ? Have you other solution to propose ?

Answer

mestachs picture mestachs · Aug 18, 2011

I wanted to remove any namespace information(declaration and tag) to ease the xpath evaluation. I end up with this solution :

String xml = ...
SAXReader reader = new SAXReader();
Document document = reader.read(new ByteArrayInputStream(xml.getBytes()));
document.accept(new NameSpaceCleaner());
return document.asXML();

where the NameSpaceCleaner is a dom4j visitor :

private static final class NameSpaceCleaner extends VisitorSupport {
    public void visit(Document document) {
        ((DefaultElement) document.getRootElement())
                .setNamespace(Namespace.NO_NAMESPACE);
        document.getRootElement().additionalNamespaces().clear();
    }
    public void visit(Namespace namespace) {
        namespace.detach();
    }
    public void visit(Attribute node) {
       if (node.toString().contains("xmlns")
        || node.toString().contains("xsi:")) {
        node.detach();
      }
    }

    public void visit(Element node) {
        if (node instanceof DefaultElement) {
        ((DefaultElement) node).setNamespace(Namespace.NO_NAMESPACE);
        }
         }
 }