Top "Scrapy" questions

Scrapy is a fast open-source high-level screen scraping and web crawling framework written in Python used to crawl websites and extract structured data from their pages.

Geopy: catch timeout error

I am using geopy to geocode some addresses and I want to catch the timeout errors and print them out …

python scrapy geopy
how to handle 302 redirect in scrapy

I am receiving a 302 response from a server while scrapping a website: 2014-04-01 21:31:51+0200 [ahrefs-h] DEBUG: Redirecting (302) to <GET …

python scrapy http-status-code-302
Scrapy: Follow link to get additional Item data?

I don't have a specific code issue I'm just not sure how to approach the following problem logistically with the …

hyperlink callback scrapy
Force my scrapy spider to stop crawling

is there a chance to stop crawling when specific if condition is true (like scrap_item_id == predefine_value ). My …

python scrapy
How can I use different pipelines for different spiders in a single Scrapy project

I have a scrapy project which contains multiple spiders. Is there any way I can define which pipelines to use …

python scrapy web-crawler
Scrapy, only follow internal URLS but extract all links found

I want to get all external links from a given website using Scrapy. Using the following code the spider crawls …

python scrapy web-crawler scrape scrapy-spider
Is it possible for Scrapy to get plain text from raw HTML data?

For example: scrapy shell http://scrapy.org/ content = hxs.select('//*[@id="content"]').extract()[0] print content Then, I get …

python html web-scraping scrapy web-crawler
Scrapy - Reactor not Restartable

with: from twisted.internet import reactor from scrapy.crawler import CrawlerProcess I've always ran this process sucessfully: process = CrawlerProcess(get_…

python scrapy web-crawler
Best way for a beginner to learn screen scraping by Python

This might be one of those questions that are difficult to answer, but here goes: I don't consider my self …

python screen-scraping beautifulsoup lxml scrapy
scrapy run spider from script

I want to run my spider from a script rather than a scrap crawl I found this page http://doc.…

python python-2.7 scrapy