How do I run long term (infinite) Python processes?

Hubro picture Hubro · Dec 31, 2011 · Viewed 30.9k times · Source

I've recently started experimenting with using Python for web development. So far I've had some success using Apache with mod_wsgi and the Django web framework for Python 2.7. However I have run into some issues with having processes constantly running, updating information and such.

I have written a script I call "daemonManager.py" that can start and stop all or individual python update loops (Should I call them Daemons?). It does that by forking, then loading the module for the specific functions it should run and starting an infinite loop. It saves a PID file in /var/run to keep track of the process. So far so good. The problems I've encountered are:

  • Now and then one of the processes will just quit. I check ps in the morning and the process is just gone. No errors were logged (I'm using the logging module), and I'm covering every exception I can think of and logging them. Also I don't think these quitting processes has anything to do with my code, because all my processes run completely different code and exit at pretty similar intervals. I could be wrong of course. Is it normal for Python processes to just die after they've run for days/weeks? How should I tackle this problem? Should I write another daemon that periodically checks if the other daemons are still running? What if that daemon stops? I'm at a loss on how to handle this.

  • How can I programmatically know if a process is still running or not? I'm saving the PID files in /var/run and checking if the PID file is there to determine whether or not the process is running. But if the process just dies of unexpected causes, the PID file will remain. I therefore have to delete these files every time a process crashes (a couple of times per week), which sort of defeats the purpose. I guess I could check if a process is running at the PID in the file, but what if another process has started and was assigned the PID of the dead process? My daemon would think that the process is running fine even if it's long dead. Again I'm at a loss just how to deal with this.

Any useful answer on how to best run infinite Python processes, hopefully also shedding some light on the above problems, I will accept


I'm using Apache 2.2.14 on an Ubuntu machine.
My Python version is 2.7.2

Answer

Owen Nelson picture Owen Nelson · Dec 31, 2011

I'll open by stating that this is one way to manage a long running process (LRP) -- not de facto by any stretch.

In my experience, the best possible product comes from concentrating on the specific problem you're dealing with, while delegating supporting tech to other libraries. In this case, I'm referring to the act of backgrounding processes (the art of the double fork), monitoring, and log redirection.

My favorite solution is http://supervisord.org/

Using a system like supervisord, you basically write a conventional python script that performs a task while stuck in an "infinite" loop.

#!/usr/bin/python

import sys
import time

def main_loop():
    while 1:
        # do your stuff...
        time.sleep(0.1)

if __name__ == '__main__':
    try:
        main_loop()
    except KeyboardInterrupt:
        print >> sys.stderr, '\nExiting by user request.\n'
        sys.exit(0)

Writing your script this way makes it simple and convenient to develop and debug (you can easily start/stop it in a terminal, watching the log output as events unfold). When it comes time to throw into production, you simply define a supervisor config that calls your script (here's the full example for defining a "program", much of which is optional: http://supervisord.org/configuration.html#program-x-section-example).

Supervisor has a bunch of configuration options so I won't enumerate them, but I will say that it specifically solves the problems you describe:

  • Backgrounding/Daemonizing
  • PID tracking (can be configured to restart a process should it terminate unexpectedly)
  • Log normally in your script (stream handler if using logging module rather than printing) but let supervisor redirect to a file for you.