I need help setting up Tor in Ubuntu and to use it within scrapy framework.
I did some research and found out this guide:
class RetryChangeProxyMiddleware(RetryMiddleware):
def _retry(self, request, reason, spider):
log.msg('Changing proxy')
tn = telnetlib.Telnet('127.0.0.1', 9051)
tn.read_until("Escape character is '^]'.", 2)
tn.write('AUTHENTICATE "267765"\r\n')
tn.read_until("250 OK", 2)
tn.write("signal NEWNYM\r\n")
tn.read_until("250 OK", 2)
tn.write("quit\r\n")
tn.close()
time.sleep(3)
log.msg('Proxy changed')
return RetryMiddleware._retry(self, request, reason, spider)
then use it in settings.py:
DOWNLOADER_MIDDLEWARE = {
'spider.middlewares.RetryChangeProxyMiddleware': 600,
}
and then you just want to send requests through local tor proxy (polipo) which could be done with:
tsocks scrapy crawl spirder
does anyone can confirm, that this method works and you get different IPs?
I was using this snippet: http://snipplr.com/view/66992/use-a-random-user-agent-for-each-request/
Update: broken link fixed