Getting cleaned HTML in text from HtmlCleaner

Nayn picture Nayn · Aug 25, 2011 · Viewed 7.1k times · Source

I want to see the cleaned HTML that we get from HTMLCleaner. I see there is a method called serialize on TagNode, however don't know how to use it. Does anybody have any sample code for it?

Thanks Nayn

Answer

Rahul Sainani picture Rahul Sainani · Jul 29, 2012

Here's the sample code:

HtmlCleaner htmlCleaner = new HtmlCleaner();

TagNode root = htmlCleaner.clean(url);

HtmlCleaner.getInnerHtml(root);

String html = "<" + root.getName() + ">" + htmlCleaner.getInnerHtml(root) + "</" + root.getName() + ">";