Java HTML Parsing

Richard Walton picture Richard Walton · Oct 26, 2008 · Viewed 109k times · Source

I'm working on an app which scrapes data from a website and I was wondering how I should go about getting the data. Specifically I need data contained in a number of div tags which use a specific CSS class - Currently (for testing purposes) I'm just checking for

div class = "classname"

in each line of HTML - This works, but I can't help but feel there is a better solution out there.

Is there any nice way where I could give a class a line of HTML and have some nice methods like:

boolean usesClass(String CSSClassname);
String getText();
String getLink();

Answer

rajsite picture rajsite · May 18, 2011

Another library that might be useful for HTML processing is jsoup. Jsoup tries to clean malformed HTML and allows html parsing in Java using jQuery like tag selector syntax.

http://jsoup.org/