Celery worker hangs without any error

Maddy picture Maddy · May 16, 2015 · Viewed 14.8k times · Source

I have a production setup for running celery workers for making a POST / GET request to remote service and storing result, It is handling load around 20k tasks per 15 min.

The problem is that the workers go numb for no reason, no errors, no warnings.

I have tried adding multiprocessing also, the same result.

In log I see the increase in the time of executing task, like succeeded in s

For more details look at https://github.com/celery/celery/issues/2621

Answer

Gary Gauh picture Gary Gauh · Nov 26, 2015

If your celery worker get stuck sometimes, you can use strace & lsof to find out at which system call it get stuck.

For example:

$ strace -p 10268 -s 10000
Process 10268 attached - interrupt to quit
recvfrom(5,

10268 is the pid of celery worker, recvfrom(5 means the worker stops at receiving data from file descriptor.

Then you can use lsof to check out what is 5 in this worker process.

lsof -p 10268
COMMAND   PID USER   FD   TYPE    DEVICE SIZE/OFF      NODE NAME
......
celery  10268 root    5u  IPv4 828871825      0t0       TCP 172.16.201.40:36162->10.13.244.205:wap-wsp (ESTABLISHED)
......

It indicates that the worker get stuck at a tcp connection(you can see 5u in FD column).

Some python packages like requests is blocking to wait data from peer, this may cause celery worker hangs, if you are using requests, please make sure to set timeout argument.


Have you seen this page:

https://www.caktusgroup.com/blog/2013/10/30/using-strace-debug-stuck-celery-tasks/