I just wrote a simple piece of code to perf test Redis + gevent to see how async helps perforamance and I was surprised to find bad performance. here is my code. If you get rid of the first two lines to monkey patch this code then you will see the "normal execution" timing.
On a Ubuntu 12.04 LTS VM, I am seeing a timing of
without monkey patch - 54 secs With monkey patch - 61 seconds
Is there something wrong with my code / approach? Is there a perf issue here?
#!/usr/bin/python
from gevent import monkey
monkey.patch_all()
import timeit
import redis
from redis.connection import UnixDomainSocketConnection
def UxDomainSocket():
pool = redis.ConnectionPool(connection_class=UnixDomainSocketConnection, path = '/var/redis/redis.sock')
r = redis.Redis(connection_pool = pool)
r.set("testsocket", 1)
for i in range(100):
r.incr('testsocket', 10)
r.get('testsocket')
r.delete('testsocket')
print timeit.Timer(stmt='UxDomainSocket()',
setup='from __main__ import UxDomainSocket').timeit(number=1000)
This is expected.
You run this benchmark on a VM, on which the cost of system calls is higher than on physical hardware. When gevent is activated, it tends to generate more system calls (to handle the epoll device), so you end up with less performance.
You can easily check this point by using strace on the script.
Without gevent, the inner loop generates:
recvfrom(3, ":931\r\n", 4096, 0, NULL, NULL) = 6
sendto(3, "*3\r\n$6\r\nINCRBY\r\n$10\r\ntestsocket\r"..., 41, 0, NULL, 0) = 41
recvfrom(3, ":941\r\n", 4096, 0, NULL, NULL) = 6
sendto(3, "*3\r\n$6\r\nINCRBY\r\n$10\r\ntestsocket\r"..., 41, 0, NULL, 0) = 41
With gevent, you will have occurences of:
recvfrom(3, ":221\r\n", 4096, 0, NULL, NULL) = 6
sendto(3, "*3\r\n$6\r\nINCRBY\r\n$10\r\ntestsocket\r"..., 41, 0, NULL, 0) = 41
recvfrom(3, 0x7b0f04, 4096, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
epoll_ctl(5, EPOLL_CTL_ADD, 3, {EPOLLIN, {u32=3, u64=3}}) = 0
epoll_wait(5, {{EPOLLIN, {u32=3, u64=3}}}, 32, 4294967295) = 1
clock_gettime(CLOCK_MONOTONIC, {2469, 779710323}) = 0
epoll_ctl(5, EPOLL_CTL_DEL, 3, {EPOLLIN, {u32=3, u64=3}}) = 0
recvfrom(3, ":231\r\n", 4096, 0, NULL, NULL) = 6
sendto(3, "*3\r\n$6\r\nINCRBY\r\n$10\r\ntestsocket\r"..., 41, 0, NULL, 0) = 41
When the recvfrom call is blocking (EAGAIN), gevent goes back to the event loop, so additional calls are done to wait for file descriptors events (epoll_wait).
Please note this kind of benchmark is a worst case for any event loop system, because you only have one file descriptor, so the wait operations cannot be factorized on several descriptors. Furthermore, async I/Os cannot improve anything here since everything is synchronous.
It is also a worst case for Redis because:
it generates many roundtrips to the server
it systematically connects/disconnects (1000 times) because the pool is declared in UxDomainSocket function.
Actually your benchmark does not test gevent, redis or redis-py: it exercises the capability of a VM to sustain a ping-pong game between 2 processes.
If you want to increase performance, you need to:
use pipelining to decrease the number of roundtrips
make the pool persistent across the whole benchmark
For instance, consider with the following script:
#!/usr/bin/python
from gevent import monkey
monkey.patch_all()
import timeit
import redis
from redis.connection import UnixDomainSocketConnection
pool = redis.ConnectionPool(connection_class=UnixDomainSocketConnection, path = '/tmp/redis.sock')
def UxDomainSocket():
r = redis.Redis(connection_pool = pool)
p = r.pipeline(transaction=False)
p.set("testsocket", 1)
for i in range(100):
p.incr('testsocket', 10)
p.get('testsocket')
p.delete('testsocket')
p.execute()
print timeit.Timer(stmt='UxDomainSocket()', setup='from __main__ import UxDomainSocket').timeit(number=1000)
With this script, I get about 3x better performance and almost no overhead with gevent.