I get twisted.internet.error.ReactorNotRestartable
error when I execute following code:
from time import sleep
from scrapy import signals
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from scrapy.xlib.pydispatch import dispatcher
result = None
def set_result(item):
result = item
while True:
process = CrawlerProcess(get_project_settings())
dispatcher.connect(set_result, signals.item_scraped)
process.crawl('my_spider')
process.start()
if result:
break
sleep(3)
For the first time it works, then I get error. I create process
variable each time, so what's the problem?
By default, CrawlerProcess
's .start()
will stop the Twisted reactor it creates when all crawlers have finished.
You should call process.start(stop_after_crawl=False)
if you create process
in each iteration.
Another option is to handle the Twisted reactor yourself and use CrawlerRunner
. The docs have an example on doing that.