ReactorNotRestartable error in while loop with scrapy

python python-2.7 scrapy twisted

k_wit · Oct 9, 2016 · Viewed 16.8k times · Source

I get twisted.internet.error.ReactorNotRestartable error when I execute following code:

from time import sleep
from scrapy import signals
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from scrapy.xlib.pydispatch import dispatcher

result = None

def set_result(item):
    result = item

while True:
    process = CrawlerProcess(get_project_settings())
    dispatcher.connect(set_result, signals.item_scraped)

    process.crawl('my_spider')
    process.start()

    if result:
        break
    sleep(3)

For the first time it works, then I get error. I create process variable each time, so what's the problem?

Answer

By default, CrawlerProcess's .start() will stop the Twisted reactor it creates when all crawlers have finished.

You should call process.start(stop_after_crawl=False) if you create process in each iteration.

Another option is to handle the Twisted reactor yourself and use CrawlerRunner. The docs have an example on doing that.

ReactorNotRestartable error in while loop with scrapy

Answer

Related questions