Web scraping with Java

java web-scraping frameworks

NoneType · Jul 8, 2010 · Viewed 134.4k times · Source

I'm not able to find any good web scraping Java based API. The site which I need to scrape does not provide any API as well; I want to iterate over all web pages using some pageID and extract the HTML titles / other stuff in their DOM trees.

Are there ways other than web scraping?

Answer

jsoup

Extracting the title is not difficult, and you have many options, search here on Stack Overflow for "Java HTML parsers". One of them is Jsoup.

You can navigate the page using DOM if you know the page structure, see http://jsoup.org/cookbook/extracting-data/dom-navigation

It's a good library and I've used it in my last projects.

Web scraping with Java

Answer

jsoup

Related questions