Python Global Interpreter Lock (GIL) workaround on multi-core systems using taskset on Linux?

Kurt picture Kurt · Jun 13, 2009 · Viewed 24.5k times · Source

So I just finished watching this talk on the Python Global Interpreter Lock (GIL) http://blip.tv/file/2232410.

The gist of it is that the GIL is a pretty good design for single core systems (Python essentially leaves the thread handling/scheduling up to the operating system). But that this can seriously backfire on multi-core systems and you end up with IO intensive threads being heavily blocked by CPU intensive threads, the expense of context switching, the ctrl-C problem[*] and so on.

So since the GIL limits us to basically executing a Python program on one CPU my thought is why not accept this and simply use taskset on Linux to set the affinity of the program to a certain core/cpu on the system (especially in a situation with multiple Python apps running on a multi-core system)?

So ultimately my question is this: has anyone tried using taskset on Linux with Python applications (especially when running multiple applications on a Linux system so that multiple cores can be used with one or two Python applications bound to a specific core) and if so what were the results? is it worth doing? Does it make things worse for certain workloads? I plan to do this and test it out (basically see if the program takes more or less time to run) but would love to hear from others as to your experiences.

Addition: David Beazley (the guy giving the talk in the linked video) pointed out that some C/C++ extensions manually release the GIL lock and if these extensions are optimized for multi-core (i.e. scientific or numeric data analysis/etc.) then rather than getting the benefits of multi-core for number crunching the extension would be effectively crippled in that it is limited to a single core (thus potentially slowing your program down significantly). On the other hand if you aren't using extensions such as this

The reason I am not using the multiprocessing module is that (in this case) part of the program is heavily network I/O bound (HTTP requests) so having a pool of worker threads is a GREAT way to squeeze performance out of a box since a thread fires off an HTTP request and then since it's waiting on I/O gives up the GIL and another thread can do it's thing, so that part of the program can easily run 100+ threads without hurting the CPU much and let me actually use the network bandwidth that is available. As for stackless Python/etc I'm not overly interested in rewriting the program or replacing my Python stack (availability would also be a concern).

[*] Only the main thread can receive signals so if you send a ctrl-C the Python interpreter basically tries to get the main thread to run so it can handle the signal, but since it doesn't directly control which thread is run (this is left to the operating system) it basically tells the OS to keep switching threads until it eventually hits the main thread (which if you are unlucky may take a while).

Answer

ynimous picture ynimous · Jun 13, 2009

Another solution is: http://docs.python.org/library/multiprocessing.html

Note 1: This is not a limitation of the Python language, but of CPython implementation.

Note 2: With regard to affinity, your OS shouldn't have a problem doing that itself.