How to disable robots.txt when you launch scrapy shell?

DARDAR SAAD picture DARDAR SAAD · Nov 26, 2016 · Viewed 8.2k times · Source

I use Scrapy shell without problems with several websites, but I find problems when the robots (robots.txt) does not allow access to a site. How can I disable robots detection by Scrapy (ignored the existence)? Thank you in advance. I'm not talking about the project created by Scrapy, but Scrapy shell command: scrapy shell 'www.example.com'

Answer

Granitosaurus picture Granitosaurus · Nov 27, 2016

If you run scrapy from project directory scrapy shell will use the projects settings.py. If you run outside of the project scrapy will use default settings. However you can override and add settings via --set flag.
So to turn off ROBOTSTXT_OBEY setting you can simply:

scrapy shell http://stackoverflow.com --set="ROBOTSTXT_OBEY=False"