Tornado blocking asynchronous requests

JeffG picture JeffG · Oct 24, 2012 · Viewed 13.4k times · Source

Using Tornado, I have a Get request that takes a long time as it makes many requests to another web service and processes the data, could take minutes to fully complete. I don't want this to block the entire web server from responding to other requests, which it currently does.

As I understand it, Tornado is single threaded and executes each request synchronously, even though it handles them asynchronously (still confused on that bit). There are parts of the long process that could be pause points to allow the server to handle other requests (possible solution?). I'm running it on Heroku with a single worker, so not sure how that translates into spawning a new thread or multiprocessing, which I have no experience in with python.

Here is what I'm trying to do: the client makes the get call to start the process, then I loop through another get call every 5 seconds to check the status and update the page with new information (long polling would also work but running into the same issue). Problem is that starting the long process blocks all new get requests (or new long polling sessions) until it completes.

Is there an easy way to kick off this long get call and not have it block the entire web server in the process? Is there anything I can put in the code to say.. "pause, go handle pending requests then continue on"?

I need to initiate a get request on ProcessHandler. I then need to continue to be able to query StatusHandler while ProcessHandler is running.

Example:

class StatusHandler(tornado.web.RequestHandler):
    @tornado.web.asynchronous
    def get(self):
       self.render("status.html")

class ProcessHandler(tornado.web.RequestHandler):
    @tornado.web.asynchronous
    def get(self):
       self.updateStatus("0")
       result1 = self.function1()
       self.updateStatus("1")
       result2 = self.function2(result1)
       self.updateStatus("2")
       result3 = self.function3(result2)
       self.updateStatus("3")
       self.finish()

Answer

koblas picture koblas · Oct 24, 2012

Here's a complete sample Tornado app that uses the Async HTTP client and the gen.Task module to make things simple.

If you read more about gen.Task in the docs you'll see that you can actually dispatch multiple requests at the same time. This is using the core idea of Tornado where everything is no blocking and still maintaining a single process.

Update: I've added a Thread handler to demonstrate how you could dispatch work into a second thread and receive the callback() when it's done.

import os
import threading
import tornado.options
import tornado.ioloop
import tornado.httpserver
import tornado.httpclient
import tornado.web
from tornado import gen
from tornado.web import asynchronous

tornado.options.define('port', type=int, default=9000, help='server port number (default: 9000)')
tornado.options.define('debug', type=bool, default=False, help='run in debug mode with autoreload (default: False)')

class Worker(threading.Thread):
   def __init__(self, callback=None, *args, **kwargs):
        super(Worker, self).__init__(*args, **kwargs)
        self.callback = callback

   def run(self):
        import time
        time.sleep(10)
        self.callback('DONE')

class Application(tornado.web.Application):
    def __init__(self):
        handlers = [
            (r"/", IndexHandler),
            (r"/thread", ThreadHandler),
        ]
        settings = dict(
            static_path = os.path.join(os.path.dirname(__file__), "static"),
            template_path = os.path.join(os.path.dirname(__file__), "templates"),
            debug = tornado.options.options.debug,
        )
        tornado.web.Application.__init__(self, handlers, **settings)

class IndexHandler(tornado.web.RequestHandler):
    client = tornado.httpclient.AsyncHTTPClient()

    @asynchronous
    @gen.engine
    def get(self):
        response = yield gen.Task(self.client.fetch, "http://google.com")

        self.finish("Google's homepage is %d bytes long" % len(response.body))

class ThreadHandler(tornado.web.RequestHandler):
    @asynchronous
    def get(self):
        Worker(self.worker_done).start()

    def worker_done(self, value):
        self.finish(value)

def main():
    tornado.options.parse_command_line()
    http_server = tornado.httpserver.HTTPServer(Application())
    http_server.listen(tornado.options.options.port)
    tornado.ioloop.IOLoop.instance().start()

if __name__ == "__main__":
    main()