Start background process/daemon from CGI script

Mitch Lindgren picture Mitch Lindgren · May 17, 2011 · Viewed 11.9k times · Source

I'm trying to launch a background process from a CGI scripts. Basically, when a form is submitted the CGI script will indicate to the user that his or her request is being processed, while the background script does the actual processing (because the processing tends to take a long time.) The problem I'm facing is that Apache won't send the output of the parent CGI script to the browser until the child script terminates.

I've been told by a colleague that what I want to do is impossible because there is no way to prevent Apache from waiting for the entire process tree of a CGI script to die. However, I've also seen numerous references around the web to a "double fork" trick which is supposed to do the job. The trick is described succinctly in this Stack Overflow answer, but I've seen similar code elsewhere.

Here's a short script I wrote to test the double-fork trick in Python:

import os
import sys

if os.fork():
    print 'Content-type: text/html\n\n Done'
    sys.exit(0)

if os.fork():
    os.setsid()
    sys.exit(0)

# Second child
os.chdir("/")
sys.stdout.close()
sys.stderr.close()
sys.stdin.close()

f = open('/tmp/lol.txt', 'w')

while 1:
     f.write('test\n')

If I run this from the shell, it does exactly what I'd expect: the original script and first descendant die, and the second descendant keeps running until it's killed manually. But if I access it through CGI, the page won't load until I kill the second descendant or Apache kills it because of the CGI timeout. I've also tried replacing the second sys.exit(0) with os._exit(0), but there is no difference.

What am I doing wrong?

Answer

Nas Banov picture Nas Banov · May 23, 2011

Don't fork - run batch separately

This double-forking approach is some kind of hack, which to me is indication it shouldn't be done :). For CGI anyway. Under the general principle that if something is too hard to accomplish, you are probably approaching it the wrong way.

Luckily you give the background info on what you need - a CGI call to initiate some processing that happens independently and to return back to the caller. Well sure - there are unix commands that do just that - schedule command to run at specific time (at) or whenever CPU is free (batch). So do this instead:

import os

os.system("batch <<< '/home/some_user/do_the_due.py'")
# or if you don't want to wait for system idle, 
#   os.system("at now <<< '/home/some_user/do_the_due.py'")

print 'Content-type: text/html\n'
print 'Done!'

And there you have it. Keep in mind that if there is some output to stdout/stderr, that will be mailed to the user (which is good for debugging but otherwise script probably should keep quiet).

PS. i just remembered that Windows also has version of at, so with minor modification of the invocation you can have that work under apache on windows too (vs fork trick that won't work on windows).

PPS. make sure the process running CGI is not excluded in /etc/at.deny from scheduling batch jobs