Top "Screen-scraping" questions

Screen-scraping, also known as web-scraping or data-scraping, is a software technique used to collect and parse information from user interfaces.

Scrape web pages in real time with Node.js

What's a good was to scrape website content using Node.js. I'd like to build something very, very fast that …

javascript jquery node.js screen-scraping web-scraping
Options for web scraping - C++ version only

I'm looking for a good C++ library for web scraping. It has to be C/C++ and nothing else so …

c++ screen-scraping
Scrapy Python Set up User Agent

I tried to override the user-agent of my crawlspider by adding an extra line to the project configuration file. Here …

python scrapy web-crawler screen-scraping user-agent
Screen scraping: getting around "HTTP Error 403: request disallowed by robots.txt"

Is there a way to get around the following? httperror_seek_wrapper: HTTP Error 403: request disallowed by robots.txt Is …

python screen-scraping beautifulsoup mechanize http-status-code-403
Headless Browser for Python (Javascript support REQUIRED!)

I need a headless browser which is fairly easy to use (I am still fairly new to Python and programming …

javascript python screen-scraping headless-browser
Scraping ajax pages using python

I've already seen this question about scraping ajax, but python isn't mentioned there. I considered using scrapy, i believe they …

python ajax web-scraping screen-scraping scrapy
How to scroll down with Phantomjs to load dynamic content

I am trying to scrape links from a page that generates content dynamically as the user scroll down to the …

javascript dom web-scraping screen-scraping phantomjs
How to run multiple Tor processes at once with different exit IPs?

I am brand new to Tor and I feel like multiple Tors should be considered. The multiple tors I mentioned …

linux proxy screen-scraping socks tor
PDF Data and Table Scraping to Excel

I'm trying to figure out a good way to increase the productivity of my data entry job. What I am …

excel pdf ocr screen-scraping pdf-parsing