We are using dom4j 1.6.1, to parse XML comming from somewhere. Sometime, the balise have mention of the namespace ( eg : ) and sometime not ( ). And it's make call of Element.selectSingleNode(String s ) fails.
For now we have 3 solutions, and we are not happy with them
1 - Remove all namespace occurence before doing anything with the xml document
xml = xml .replaceAll("xmlns=\"[^\"]*\"","");
xml = xml .replaceAll("ds:","");
xml = xml .replaceAll("etm:","");
[...] // and so on for each kind of namespace
2 - Remove namespace just before getting a node By calling
Element.remove(Namespace ns)
But it's works only for a node and the first level of child
3 - Clutter the code by
node = rootElement.selectSingleNode(NameWithoutNameSpace)
if ( node == null )
node = rootElement.selectSingleNode(NameWithNameSpace)
So ... what do you think ? Witch one is the less worse ? Have you other solution to propose ?
I wanted to remove any namespace information(declaration and tag) to ease the xpath evaluation. I end up with this solution :
String xml = ...
SAXReader reader = new SAXReader();
Document document = reader.read(new ByteArrayInputStream(xml.getBytes()));
document.accept(new NameSpaceCleaner());
return document.asXML();
where the NameSpaceCleaner is a dom4j visitor :
private static final class NameSpaceCleaner extends VisitorSupport {
public void visit(Document document) {
((DefaultElement) document.getRootElement())
.setNamespace(Namespace.NO_NAMESPACE);
document.getRootElement().additionalNamespaces().clear();
}
public void visit(Namespace namespace) {
namespace.detach();
}
public void visit(Attribute node) {
if (node.toString().contains("xmlns")
|| node.toString().contains("xsi:")) {
node.detach();
}
}
public void visit(Element node) {
if (node instanceof DefaultElement) {
((DefaultElement) node).setNamespace(Namespace.NO_NAMESPACE);
}
}
}