How to disable robots.txt when you launch scrapy shell?

python scrapy web-crawler robots.txt scrapy-shell

DARDAR SAAD · Nov 26, 2016 · Viewed 8.2k times · Source

I use Scrapy shell without problems with several websites, but I find problems when the robots (robots.txt) does not allow access to a site. How can I disable robots detection by Scrapy (ignored the existence)? Thank you in advance. I'm not talking about the project created by Scrapy, but Scrapy shell command: scrapy shell 'www.example.com'

Answer

If you run scrapy from project directory scrapy shell will use the projects settings.py. If you run outside of the project scrapy will use default settings. However you can override and add settings via --set flag.
So to turn off ROBOTSTXT_OBEY setting you can simply:

scrapy shell http://stackoverflow.com --set="ROBOTSTXT_OBEY=False"

How to disable robots.txt when you launch scrapy shell?

Answer

Related questions