getting Forbidden by robots.txt: scrapy

deepak kumar picture deepak kumar · May 17, 2016 · Viewed 37k times · Source

while crawling website like https://www.netflix.com, getting Forbidden by robots.txt: https://www.netflix.com/>

ERROR: No response downloaded for: https://www.netflix.com/

Answer

Rafael Almeida picture Rafael Almeida · May 17, 2016

In the new version (scrapy 1.1) launched 2016-05-11 the crawl first downloads robots.txt before crawling. To change this behavior change in your settings.py with ROBOTSTXT_OBEY

ROBOTSTXT_OBEY = False

Here are the release notes