Top "Screen-scraping" questions

Screen-scraping, also known as web-scraping or data-scraping, is a software technique used to collect and parse information from user interfaces.

BeautifulSoup and ASP.NET/C#

Has anyone integrated BeautifulSoup with ASP.NET/C# (possibly using IronPython or otherwise)? Is there a BeautifulSoup alternative or a …

c# asp.net screen-scraping ironpython beautifulsoup
Getting attributes in PyQuery?

I'm using PyQuery and want to print a list of links, but can't figure out how to get the href …

python screen-scraping pyquery
page scraping to get prices from google finance

I am trying to get stock prices by scraping google finance pages, I am doing this in python, using urllib …

python screen-scraping urllib stockquotes google-finance
View Generated Source (After AJAX/JavaScript) in C#

Is there a way to view the generated source of a web page (the code after all AJAX calls and …

c# .net screen-scraping
Alternative to HtmlUnit

I have been researching about the headless browsers available till to date and found HtmlUnit being used pretty extensively. Do …

screen-scraping web-crawler htmlunit headless-browser
Using Ruby with Mechanize to log into a website

I need to scrape data from a site, but it requires my login first. I've been using hpricot to successfully …

ruby login screen-scraping mechanize hpricot
Retrieve multiple urls at once/in parallel

Possible Duplicate: How can I speed up fetching pages with urllib2 in python? I have a python script that download …

python parallel-processing screen-scraping
Taking reliable screenshots of websites? Phantomjs and Casperjs both return empty screen shots on some websites

Open a web page and take a screenshot. Using ONLY phantomjs: (this is a simple script, in fact it is …

javascript phantomjs screen-scraping casperjs
Grabbing each frame of an HTML5 canvas

These palette cycle images are breathtaking: http://www.effectgames.com/demos/canvascycle/?sound=0 I'd like to make some (or all) …

javascript animation html screen-scraping
How can i grab CData out of BeautifulSoup

I have a website that I'm scraping that has a similar structure the following. I'd like to be able to …

python screen-scraping beautifulsoup cdata