Top "Scrapy" questions

Scrapy is a fast open-source high-level screen scraping and web crawling framework written in Python used to crawl websites and extract structured data from their pages.

Strip \n \t \r in scrapy

I'm trying to strip \r \n \t characters with a scrapy spider, making then a json file. I have a "…

python unicode scrapy
ReactorNotRestartable error in while loop with scrapy

I get twisted.internet.error.ReactorNotRestartable error when I execute following code: from time import sleep from scrapy import signals …

python python-2.7 scrapy twisted
Python Scrapy: Convert relative paths to absolute paths

I have amended the code based on solutions offered below by the great folks here; I get the error shown …

python scrapy imagesource
command 'gcc' failed with exit status 1 error while installing scrapy

When I want to install Scrapy I meet this error: warning: no previously-included files found matching '*.py' Requirement already …

python scrapy centos6
Running Scrapy spiders in a Celery task

I have a Django site where a scrape happens when a user requests it, and my code kicks off a …

python django scrapy celery
scrapy how to set referer url

I need to set the referer url, before scraping a site, the site uses refering url based Authentication, so it …

screen-scraping scrapy
ImportError: No module named win32api while using Scrapy

I am a new learner of Scrapy. I installed python 2.7 and all other engines needed. Then I tried to build …

python scrapy scrapy-spider
How to force scrapy to crawl duplicate url?

I am learning Scrapy a web crawling framework. by default it does not crawl duplicate urls or urls which scrapy …

python web-crawler scrapy
Scrapy css selector: get text of all inner tags

I have a tag and I want to get all the text inside available. I am doing this: response.css(…

html css scrapy
How do I set up Scrapy to deal with a captcha

I'm trying to scrape a site that requires the user to enter the search value and a captcha. I've got …

python web-scraping scrapy captcha