Top "Screen-scraping" questions

Screen-scraping, also known as web-scraping or data-scraping, is a software technique used to collect and parse information from user interfaces.

Programmatic Python Browser with JavaScript

I want to screen-scrape a web-site that uses JavaScript. There is mechanize, the programmatic web browser for Python. However, it (…

javascript python browser screen-scraping mechanize
Crawling and Scraping iTunes App Store

I noticed that iTunes preview allows you to crawl and scrape pages via the http:// protocol. However, many of the …

language-agnostic itunes screen-scraping web-crawler
scrapy how to set referer url

I need to set the referer url, before scraping a site, the site uses refering url based Authentication, so it …

screen-scraping scrapy
How to convert HTML page to plain text in node.js?

I know this has been asked before but I can't find a good answer for node.js I need server-side …

javascript node.js screen-scraping
Rotating Proxies for web scraping

I've got a python web crawler and I want to distribute the download requests among many different proxy servers, probably …

python proxy screen-scraping web-crawler squid
scrape html generated by javascript with python

I need to scrape a site with python. I obtain the source html code with the urlib module, but I …

javascript python browser screen-scraping
Whats the best screen scraping language?

Hi I want to create a desktop app (c# prob) that scrapes or manipulates a form on a 3rd party …

programming-languages screen-scraping web-scraping
Beautiful Soup cannot find a CSS class if the object has other classes, too

if a page has <div class="class1"> and <p class="class1">, then soup.findAll(True, 'class1…

python screen-scraping beautifulsoup
How to download any(!) webpage with correct charset in python?

Problem When screen-scraping a webpage using python one has to know the character encoding of the page. If you get …

python character-encoding screen-scraping urllib2 urllib
Is there a PHP equivalent of Perl's WWW::Mechanize?

I'm looking for a library that has functionality similar to Perl's WWW::Mechanize, but for PHP. Basically, it should allow …

php automation screen-scraping mechanize www-mechanize