Top "Web-crawler" questions

A Web crawler (also known as Web spider) is a computer program that browses the World Wide Web in a methodical, automated manner or in an orderly fashion.

How is an aggregator built?

Let's say I want to aggregate information related to a specific niche from many sources (could be travel, technology, or …

web-services aggregation web-crawler nutch
How do I prevent Bing from swamping my site with traffic irregularly?

Bingbot will hit my site pretty hard for a couple of hours each day, and will be extremely light for …

web-crawler robots.txt bing bingbot
How to get all links from the DOM?

According to https://github.com/GoogleChrome/puppeteer/issues/628, I should be able to get all links from < a href="…

javascript node.js web-crawler puppeteer headless-browser
TypeError: coercing to Unicode: need string or buffer, User found

i have to crawl last.fm for users (university exercise). I'm new to python and get following error: Traceback (most …

python loops web-crawler typeerror last.fm
Nightmare conditional wait()

I'm trying to crawl a webpage using Nightmare, but want to wait for #someelem to be present, only if it …

javascript node.js web-crawler nightmare
how do web crawlers handle javascript

Today a lot of content on Internet is generated using JavaScript (specifically by background AJAX calls). I was wondering how …

javascript web-crawler
How to improve SEO for single page application

We have built a search-engine for vacancies. For reasons of speed and a good user-experience, we used a the architecture …

knockout.js seo web-crawler single-page-application pushstate
running multiple threads in python, simultaneously - is it possible?

I'm writing a little crawler that should fetch a URL multiple times, I want all of the threads to run …

python multithreading web-crawler gil
An alternative web crawler to Nutch

I'm trying to build a specialised search engine web site that indexes a limited number of web sites. The solution …

search-engine web-crawler nutch
PyPi download counts seem unrealistic

I put a package on PyPi for the first time ~2 months ago, and have made some version updates since then. …

python web-crawler pypi