I use Scrapy shell without problems with several websites, but I find problems when the robots (robots.txt) does not allow access to a site.
How can I disable robots detection by Scrapy (ignored the existence)?
Thank you in advance.
I'm not talking about the project created by Scrapy, but Scrapy shell command: scrapy shell 'www.example.com'
If you run scrapy from project directory scrapy shell
will use the projects settings.py
. If you run outside of the project scrapy will use default settings. However you can override and add settings via --set
flag.
So to turn off ROBOTSTXT_OBEY
setting you can simply:
scrapy shell http://stackoverflow.com --set="ROBOTSTXT_OBEY=False"