Python hangs in futex calls

Helmut Grohne picture Helmut Grohne · Oct 11, 2010 · Viewed 7.1k times · Source

I have a Python daemon running in production. It employs between 7 and 120 threads. Recently the smallest instance (7 threads) started to show hangs while all other instances never showed this kind of problem. Attaching strace to the python process shows that all threads are calling futex FUTEX_WAIT_PRIVATE, so they are probably trying to lock something.

How would you debug such a problem?

Note that this is a production system running from flash memory, so disk writes are constrained, too.

Answer

Helmut Grohne picture Helmut Grohne · Oct 12, 2010

The observation was slightly incorrect. One thread wasn't calling futex, but instead swapping while holding the gil. Since the machine in question is low hardware this swapping took very long and seemed to be a deadlock. The underlying problem is a memory leak. :-(