How to Parse Only Text from HTML

Jesvin picture Jesvin · Aug 18, 2010 · Viewed 17.9k times · Source

how can i parse only text from a web page using jsoup using java?

Answer

Ryan Berger picture Ryan Berger · Aug 18, 2010

From jsoup cookbook: http://jsoup.org/cookbook/extracting-data/attributes-text-html

String html = "<p>An <a href='http://example.com/'><b>example</b></a> link.</p>";
Document doc = Jsoup.parse(html);
String text = doc.body().text(); // "An example link"