How do you Programmatically Download a Webpage in Java

jjnguy picture jjnguy · Oct 26, 2008 · Viewed 196.6k times · Source

I would like to be able to fetch a web page's html and save it to a String, so I can do some processing on it. Also, how could I handle various types of compression.

How would I go about doing that using Java?

Answer

BalusC picture BalusC · Dec 31, 2010

I'd use a decent HTML parser like Jsoup. It's then as easy as:

String html = Jsoup.connect("http://stackoverflow.com").get().html();

It handles GZIP and chunked responses and character encoding fully transparently. It offers more advantages as well, like HTML traversing and manipulation by CSS selectors like as jQuery can do. You only have to grab it as Document, not as a String.

Document document = Jsoup.connect("http://google.com").get();

You really don't want to run basic String methods or even regex on HTML to process it.

See also: