How to avoid blocking code in python with gevent?

Martin picture Martin · Aug 20, 2012 · Viewed 7.7k times · Source

I am playing around with gevent, and I am trying to understand why my code is blocking and how I can fix it.

I have a pool of greenlets, and each of them talk to a thrift client which gathers data from a remote thrift server. For the purpose of the exercise, the thrift server always take > 1s to return any data. When I spawn the greenlets, and run join, they don't execute all in parallel, but instead one after the other. My understanding is that this is happening because my code is "blocking", since when I run monkey.patch_all(), all greenlets magically run in parallel.

So how do I make the code non-blocking myself rather that monkey patching everything and not understanding what it's doing?

An example here of what I don't understand :

import time

from gevent.pool import Pool

def hello():
    print 'Hello %d' % time.time()
    time.sleep(1) 

def main():
    pool = Pool(5)
    for _ in xrange(5):
        pool.spawn(hello)

    pool.join()

if __name__ == '__main__':
    main()

Output

Hello 1345477112
Hello 1345477113
Hello 1345477114
Hello 1345477115
Hello 1345477116

I know I could be using gevent.sleep, but how to make that function non blocking with the regular time.sleep?

Thanks

Answer

lvella picture lvella · Aug 20, 2012

Greenlets never run in parallel, they all share the same process and the same thread, so, there is at most one of them running at a time.

Greenlets are green because they are co-routines ("co" from cooperation), thus, it can not even be said that they run concurrently, because you need to coordinate their running. Gevent does most of this work for you behind the scenes, and knows from libevent (or libev) what greenlets are ready to run. There is no preemption at all.

On the example you gave, time.sleep(2) will put the process to sleep inside the operating system, so gevent's scheduler won't run and won't be able to switch to another greenlet.

So, concerning your question: if you don't want to monkey patch an existing code, you will have to manually replace every blocking call to the gevent's equivalent, so that gevent may schedule away the calling greenlet and choose another one to run.

EDIT: Regarding using gevent with thrift without monkey patching all: I don't know if it is worth.

If you want to modify (fork) thrift's library, just need to change the file TSocket.py, and change:

import socket

to:

from gevent import socket

But then your thrift library will depend on gevent, and you will need to reapply the patch if you ever update thrift.

You may also subclass TSocket, change the method open() to use gevent's socket, and use it in place of the former, but seems more complicated to me.

I am actually using Thrift with Gevent, and I choose for monkey patching the whole thing for sake of simplicity.