What is the current state of libraries for scraping websites with Haskell?
I'm trying to make myself do more of my quick oneoff tasks in Haskell, in order to help increase my comfort level with the language.
In Python, I tend to use the excellent PyQuery library for this. Is there something similarly simple and easy in Haskell? I've looked into Tag Soup, and while the parser itself seems nice, actually traversing pages doesn't seem as nice as it is in other languages.
Is there a better option out there?
http://hackage.haskell.org/package/shpider
Shpider is a web automation library for Haskell. It allows you to quickly write crawlers, and for simple cases ( like following links ) even without reading the page source.
It has useful features such as turning relative links from a page into absolute links, options to authorize transactions only on a given domain, and the option to only download html documents.
It also provides a nice syntax for filling out forms.
An example:
runShpider $ do
download "http://apage.com"
theForm : _ <- getFormsByAction "http://anotherpage.com"
sendForm $ fillOutForm theForm $ pairs $ do
"occupation" =: "unemployed Haskell programmer"
"location" =: "mother's house"
(Edit in 2018 -- shpider is deprecated, these days https://hackage.haskell.org/package/scalpel might be a good replacement)