Asyncio vs. Gevent

Dan Gittik picture Dan Gittik · Jan 18, 2019 · Viewed 10.7k times · Source

Background

I once worked on a Python2 system that had a lot of custom I/O code written synchronously, and was scaled using threads. At some point, we couldn't scale it any further, and realised we have to switch to asynchronous programming.

  • Twisted was the popular choice, but we wanted to avoid its callback hell.
  • It did have the @inlineCallbacks decorator, which effectively implemented coroutines using generator magic, as did some other libraries. That was more tolerable, but felt a bit flaky.
  • And then we found gevent. All you had to do was:
from gevent import monkey
monkey.patch_all()

And just like that, all your standard I/O - sockets, database transactions, everything written in pure Python, really - was asynchronous, yielding and switching behind the scenes using greenlets.

It wasn't perfect:

  • Back then, it didn't work well on Windows (and it still has some limitations today). Luckily, we were running on Linux.
  • It couldn't monkey-patch C extensions, so we couldn't use MySQLdb, for example. Luckily, there were many pure Python alternatives, like PyMySQL.

Question

Nowadays, Python 3 is much more popular, and with it - asyncio. Personally, I think it's great, but I was recently asked in what ways does it differ from what we implemented with gevent, and couldn't come up with a good enough answer.

This might sound subjective, but I'm actually looking for real use-cases where one would significantly outperform the other, or allow something that the other does not. Here are the considerations I've gathered so far:

  1. Like I said, gevent is rather limited on Windows. Then again, most production code I know of runs on Linux.

    If you need to run on Windows, use asyncio.

  2. Gevent can't monkey-patch C extensions. But, asyncio can't monkey-patch anything.

    Imagine that a new DB technology comes up, and you'd like to use it, but there's isn't a pure Python library for it, so you can't integrate it with Gevent. The thing is, you're just as stuck when there isn't an io* library that you can integrate with asyncio! There are worker threads and executors, of course, but that's not the point, and works just as well in both cases anyway.

  3. Some people say it's a matter of personal taste, but I think it's fair to say that synchronous programming is inherently easier that asynchronous programming (think about it: have you ever met a novice programmer that can work with sockets, but has a hard time understanding how to properly select/poll them, or thinking in futures/promises? And have you ever met the reverse?).

    Anyway, let's not go there. I wanted to address this point because it comes up frequently (here's a discussion on reddit), but what I'm really after is scenarios where you have a practical reason to use one or the other.

  4. Asyncio is part of the standard library. That's huge: it means it's well maintained, well documented, and everybody knows about it and uses it by default.

    But, considering how little of Gevent you need to know to use it (and that it's pretty well maintained and documented as well), it doesn't seem as crucial. So while there are multiple answers on StackOverflow for even the most complicated scenarios involving futures, the possibility to not use futures at all seems just as viable.

Surely Guido and the Python community had a good reason to put so much effort into Asyncio, and even introduce new keywords into the languages - I just can't seem to find them.

What are the key differences between the two and in what scenarios do the become apparent?

Answer

Slam picture Slam · Jan 18, 2019

"Simple" answer from real-world usage:

  1. Good thing about gevent — you can patch things, which means that you [theoretically] can use synchronous libraries. I.e. you can patch django.
  2. Bad thing about gevent — not everything can be patched, if you must use some DB driver that can't be patched, you're doomed
  3. Worst thing about gevent — it's "magical". Amount of effort required to understand what happens with "patch_all" is enormous, the same effort applies to finding/hiring new people for your dev team. What is even worse — debugging gevent-based code is hell. I'd say, pretty much the same hell, as callbacks, if not worse.

Later point is key, I think. Most underestimated thing in software engineering is that code is meant to be read, not written or run effectively (if later is the case, you'd rather switch from python to system-level language). Asyncio came with missing part for async programming — pre-defined and controlled context switch points. You actually writing sync code (i.e. you're not thinking about sudden thread switch, locks, queues, etc.), and using await ... when you know call is IO blocking, so you let event loop pick on something else, that is ready for CPU, and pick up current state later.

This is what makes asyncio so good — it's easy to maintain. The downside is that pretty much all "world" must be async too — DB drivers, http tools, file handlers. And sometimes you'll be missing libraries, that's pretty much guaranteed.