How to convert a Jsoup Document to a W3C Document?

chaosguru picture chaosguru · Jul 23, 2013 · Viewed 8k times · Source

I have build a Jsoup Document by parsing a in-house HTML page,

public Document newDocument(String path) throws IOException {

    Document doc = null;
    doc = Jsoup.connect(path).timeout(0).get();
            return new HtmlDocument<Document>(doc);
}

I would want to convert the Jsoup document to my org.w3c.dom.Document I used an available library DOMBuilder for this but when parsing I get org.w3c.dom.Document as null. I am unable to understand the problem, tried searching but couldnt find any answer.

Code to generate the W3C DOM Document :

Document jsoupDoc=factory.newDocument("http:localhost/testcases/test_2.html"));
org.w3c.dom.Document docu= DOMBuilder.jsoup2DOM(jsoupDoc);

Can anyone please help me on this?

Answer

Stephan picture Stephan · May 15, 2015

Alternatively, Jsoup provides the W3CDom class with the method fromJsoup. This method transforms a Jsoup Document into a W3C document.

Document jsoupDoc = ...
W3CDom w3cDom = new W3CDom();
org.w3c.dom.Document w3cDoc = w3cDom.fromJsoup(jsoupDoc);

UPDATE: