Ignoring DTD when parsing XML

LordDoskias picture LordDoskias · Nov 10, 2011 · Viewed 10.5k times · Source

How can I ignore the DTD declaration when parsing file with XOM xml library. My file has the following line :

<?xml version="1.0"?>
<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "NCBI_BlastOutput.dtd">
//rest of stuff here 

And when I try to build() my document I get a filenotfound exception for the DTD file. I know I don't have this file and I don't care about it, so how can it be removed when using XOM?

Here is a code snippet:

public BlastXMLParser(String filePath) {
    Builder b = new Builder(false);
     //not a good idea to have exception-throwing code in constructor
    try {

        _document = b.build(filePath);
    } catch (ParsingException ex) {
        Logger.getLogger(BlastXMLParser.class.getName()).log(Level.SEVERE,"err", ex);
    } catch (IOException ex) {
        //
    }

private Elements getBlastReads() {
    Element root = _document.getRootElement();
    Elements rootChildren = root.getChildElements();

    for (int i = 0; i < rootChildren.size(); i++) {
        Element child = rootChildren.get(i);
        if (child.getLocalName().equals("BlastOutput_iterations")) {

            return child.getChildElements();
        }
    }

    return null;
}
}

I get a NullPointerException at this line:

Element root = _document.getRootElement();

With the DTD line removed from the source XML file I can successfully parse it, but this is not an option in the final production system.

Answer

J&#246;rn Horstmann picture Jörn Horstmann · Nov 10, 2011

The preferred solution would be to implement an EntityResolver that intercepts requests for the DTD and redirects these to an embedded copy. If you

  1. don't have access to the DTD and
  2. are absolutely sure you won't need it (apart from validation it might also declare character entities that are used in the document) and
  3. you are using the Xerces XML Parser implementation

you can disable fetching of DTD by setting the corresponding SAX feature. In XOM this should be possible by passing an XMLReader to the Builder constructor like this:

import org.xml.sax.XMLReader;
import org.xml.sax.helpers.XMLReaderFactory;

...

XMLReader xmlreader = XMLReaderFactory.createXMLReader();
xmlreader.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
Builder builder = new Builder(xmlreader);