gevent: downside to spawning large number of greenlets?

ARF picture ARF · Nov 4, 2013 · Viewed 8.6k times · Source

Following on from my question in the comment to this answer to the question "Gevent pool with nested web requests":

Assuming one has a large number of tasks, is there any downside to using gevent.spawn(...) to spawn all of them simultaneously rather than using a gevent pool and pool.spawn(...) to limit the number of concurrent greenlets?

Formulated differently: is there any advantage to "limiting concurrency" with a gevent.Pool even if not required by the problem to be solved?

Any idea what would constitute a "large number" for this issue?

Answer

Jordan Shaw picture Jordan Shaw · Nov 7, 2013

It's just cleaner and a good practice when dealing with a lot of stuff. I ran into this a few weeks ago I was using gevent spawn to verify a bunch of emails against DNS on the order of 30k :).

from gevent.pool import Pool
import logging
rows = [ ... a large list of stuff ...]
CONCURRENCY = 200 # run 200 greenlets at once or whatever you want
pool = Pool(CONCURRENCY)
count = 0

def do_work_function(param1,param2):
   print param1 + param2

for row in rows:
  count += 1 # for logging purposes to track progress
  logging.info(count)
  pool.spawn(do_work_function,param1,param2) # blocks here when pool size == CONCURRENCY

pool.join() #blocks here until the last 200 are complete

I found in my testing that when CONCURRENCY was around 200 is when my machine load would hover around 1 on a EC2 m1.small. I did it a little naively though, if I were to do it again I'd run multiple pools and sleep some time in between them to try to distribute the load on the NIC and CPU more evenly.

One last thing to keep in mind is keeping an eye on your open files and increasing that if need be: http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files. The greenlets I was running were taking up around 5 file descriptors per greenlet so you can run out pretty quickly if you aren't careful. This may not be helpful if your system load is above one as you'll start seeing diminishing returns regardless.