Top "Web-crawler" questions

A Web crawler (also known as Web spider) is a computer program that browses the World Wide Web in a methodical, automated manner or in an orderly fashion.

How to request Google to re-crawl my website?

Does someone know a way to request Google to re-crawl a website? If possible, this shouldn't last months. My site …

seo web-crawler
How to find all links / pages on a website

Is it possible to find all the pages and links on ANY given website? I'd like to enter a URL …

directory web-crawler
Get a list of URLs from a site

I'm deploying a replacement site for a client but they don't want all their old pages to end in 404s. …

web-crawler
Sending "User-agent" using Requests library in Python

I want to send a value for "User-agent" while requesting a webpage using Python Requests. I am not sure is …

python web-crawler python-requests
How do I make a simple crawler in PHP?

I have a web page with a bunch of links. I want to write a script which would dump all …

php web-crawler
TypeError: can't use a string pattern on a bytes-like object in re.findall()

I am trying to learn how to automatically fetch urls from a page. In the following code I am trying …

python python-3.x web-crawler
How to find sitemap.xml path on websites?

How can I find sitemap.xml file of websites? e.g. Going to stackoverflow/sitemap.xml gets me a 404. In …

web-crawler sitemap
Python: maximum recursion depth exceeded while calling a Python object

I've built a crawler that had to run on about 5M pages (by increasing the url ID) and then parses …

python algorithm recursion web-crawler depth
how to detect search engine bots with php?

How can one detect the search engine bots using php?

php web-crawler bots
How to get a web page's source code from Java

I just want to retrieve any web page's source code from Java. I found lots of solutions so far, but …

java web web-crawler web-content