Is DocumentBuilder thread safe?

CKing picture CKing · Sep 17, 2012 · Viewed 31.8k times · Source

The current code base that I am looking at uses the DOM parser. The following code fragment is duplicated in 5 methods :

 DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
 DocumentBuilder builder = factory.newDocumentBuilder();

If a method that contains the above code is called in a loop or the method is called multiple times in the application, we are bearing the overhead of creating a new DocumentBuilderFactory instance and a new DocumentBuilder instance for each call to such a method.

Would it be a good idea to create a singleton wrapper around the DocumentBuilder factory and DocumentBuilder instances as shown below :

public final class DOMParser {
   private DocumentBuilderFactory = new DocumentBuilderFactory();
   private DocumentBuilder builder;

   private static DOMParser instance = new DOMParser();

   private DOMParser() {
      builder = factory.newDocumentBuilder();
   }

   public Document parse(InputSource xml) {
       return builder.parser(xml);
   }
}

Are there any problems that can arise if the above singleton is shared across multiple threads? If not, will there be any performance gain by using the above approach of creating the DocumentBuilderFactory and the DocumentBuilder instances only once throughout the lifetime of the application?

Edit :

The only time we can face a problem is if DocumentBuilder saves some state information while parsing an XML file which can affect the parsing of the next XML file.

Answer

Denis Tulskiy picture Denis Tulskiy · Sep 17, 2012

See the comments section for other questions about the same matter. Short answer for your question: no, it's not ok to put these classes in a singleton. Neither DocumentBuilderFactory nor DocumentBuilder are guaranteed to be thread safe. If you have several threads parsing XML, make sure each thread has its own version of DoumentBuilder. You only need one of them per thread since you can reuse a DocumentBuilder after you reset it.

EDIT A small snippet to show that using same DocumentBuilder is bad. With java 1.6_u32 and 1.7_u05 this code fails with org.xml.sax.SAXException: FWK005 parse may not be called while parsing. Uncomment synchronization on builder, and it works fine:

        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        final DocumentBuilder builder = factory.newDocumentBuilder();

        ExecutorService exec = Executors.newFixedThreadPool(10);
        for (int i = 0; i < 10; i++) {
            exec.submit(new Runnable() {
                public void run() {
                    try {
//                        synchronized (builder) {
                            InputSource is = new InputSource(new StringReader("<?xml version=\"1.0\" encoding=\"UTF-8\" ?><俄语>данные</俄语>"));
                            builder.parse(is);
                            builder.reset();
//                        }
                    } catch (Exception e) {
                        e.printStackTrace();
                    }
                }
            });
        }
        exec.shutdown();

So here's your answer - do not call DocumentBuilder.parse() from multiple threads. Yes, this behavior might be JRE specific, if you're using IBM java or JRockit or give it a different DocumentBuilderImpl, it might work fine, but for default xerces implementation - it does not.