Web Scraping with Scala

Michael Tingley picture Michael Tingley · Feb 7, 2013 · Viewed 16.8k times · Source

Just wondering if anyone knows of a web-scraping library that takes advantage of Scala's succinct syntax. So far, I've found Chafe, but this seems poorly-documented and maintained. I'm wondering if anyone out there has done scraping with Scala and has advice. (I'm trying to integrate into an existing Scala framework rather than use a scraper written in, say, Python.)

Answer

Adam Gent picture Adam Gent · Feb 7, 2013

First there is a plethora of HTML scraping libs in JVM all you need to do is pimp one of them (pimp my library pattern).

The four I have used are:

  • HtmlUnit - Will emulate the browser and even run Javascript
  • Jericho - Preserves formatting and ideal if you want to edit the scraped HTML
  • NekoHtml
  • JSoup -- does not work with Scala. Might work

I have used Selenium but never for scraping. Scala has a wrapper around selenium.

I would recommend pimping an existing Java library over some half baked Scala lib.