Best Java lib for programmatically converting a HTML page to an Image/PDF

empire29 picture empire29 · Apr 24, 2012 · Viewed 8.7k times · Source

I am looking for the best Java lib which I can pass in a URL and have it create an image of what the web page looks like as it would in a browser. I tried out flyingsaucer however it seems like almost every web page breaks it -- it wont even render www.google.com or yahoo.com -- the only site i could get it to render is www.w3c.org!

Thoughts on a better tool to use, or possibly allow flying saucer to be more lax in the xhtml is accepts?

Answer

ollo picture ollo · Aug 27, 2012

Flying Saucer fails on many pages since it only allows xhtml (see manual).

But you can use some html libs to "clean" your input an then use FS.

Webesite -> "Cleaner" -> Flying Saucer

Some good and free libs are:

  1. JSoup (personal recommendation)
  2. HtmlCleaner
  3. JTidy (sometimes more strict than needed)
  4. Jericho HTML