How do screen scrapers work?

Micah picture Micah · Oct 1, 2008 · Viewed 22k times · Source

I hear people writing these programs all the time and I know what they do, but how do they actually do it? I'm looking for general concepts.

Answer

bmdhacks picture bmdhacks · Oct 1, 2008

Technically, screenscraping is any program that grabs the display data of another program and ingests it for it's own use.

Quite often, screenscaping refers to a web client that parses the HTML pages of targeted website to extract formatted data. This is done when a website does not offer an RSS feed or a REST API for accessing the data in a programmatic way.

One example of a library used for this purpose is Hpricot for Ruby, which is one of the better-architected HTML parsers used for screen scraping.