Top "Web-crawler" questions

A Web crawler (also known as Web spider) is a computer program that browses the World Wide Web in a methodical, automated manner or in an orderly fashion.

HTTPWebResponse + StreamReader Very Slow

I'm trying to implement a limited web crawler in C# (for a few hundred sites only) using HttpWebResponse.GetResponse() and …

c# performance web-crawler httpwebresponse streamreader
Robots.txt: allow only major SE

Is there a way to configure the robots.txt so that the site accepts visits ONLY from Google, Yahoo! and …

web-crawler robots.txt
scrapy- how to stop Redirect (302)

I'm trying to crawl a url using Scrapy. But it redirects me to page that doesn't exist. Redirecting (302) to <…

web-scraping web-crawler scrapy
Difference between find and filter in jquery

I'm working on fetching data from wiki pages. I'm using a combination of php and jquery to do this. First …

jquery find web-crawler
Can I block search crawlers for every site on an Apache web server?

I have somewhat of a staging server on the public internet running copies of the production code for a few …

apache search web-crawler httpd.conf
how to fix HTTP error fetching URL. Status=500 in java while crawling?

I am trying to crawl the user's ratings of cinema movies of imdb from the review page: (number of movies …

java web-crawler jsoup http-error
How to use Goutte

Issue: Cannot fully understand the Goutte web scraper. Request: Can someone please help me understand or provide code to help …

web-crawler screen-scraping goutte
What is the easiest way to run python scripts in a cloud server?

I have a web crawling python script that takes hours to complete, and is infeasible to run in its entirety …

python cloud web-crawler virtual server
How to give URL to scrapy for crawling?

I want to use scrapy for crawling web pages. Is there a way to pass the start URL from the …

scrapy web-crawler
What are some good Ruby-based web crawlers?

I am looking at writing my own, but I am wondering if there are any good web crawlers out there …

ruby web-crawler