Is it safe to fork from within a thread?

Ælex picture Ælex · May 21, 2011 · Viewed 21k times · Source

Let me explain: I have already been developing an application on Linux which forks and execs an external binary and waits for it to finish. Results are communicated by shm files that are unique to the fork + process. The entire code is encapsulated within a class.

Now I am considering threading the process in order to speed things up. Having many different instances of class functions fork and execute the binary concurrently (with different parameters) and communicate results with their own unique shm files.

Is this thread safe? If I fork within a thread, apart from being safe, is there something I have to watch for? Any advice or help is much appreciated!

Answer

Kevin picture Kevin · May 21, 2011

The problem is that fork() only copies the calling thread, and any mutexes held in child threads will be forever locked in the forked child. The pthread solution was the pthread_atfork() handlers. The idea was you can register 3 handlers: one prefork, one parent handler, and one child handler. When fork() happens prefork is called prior to fork and is expected to obtain all application mutexes. Both parent and child must release all mutexes in parent and child processes respectively.

This isn't the end of the story though! Libraries call pthread_atfork to register handlers for library specific mutexes, for example Libc does this. This is a good thing: the application can't possibly know about the mutexes held by 3rd party libraries, so each library must call pthread_atfork to ensure it's own mutexes are cleaned up in the event of a fork().

The problem is that the order that pthread_atfork handlers are called for unrelated libraries is undefined (it depends on the order that the libraries are loaded by the program). So this means that technically a deadlock can happen inside of a prefork handler because of a race condition.

For example, consider this sequence:

  1. Thread T1 calls fork()
  2. libc prefork handlers are called in T1 (e.g. T1 now holds all libc locks)
  3. Next, in Thread T2, a 3rd party library A acquires its own mutex AM, and then makes a libc call which requires a mutex. This blocks, because libc mutexes are held by T1.
  4. Thread T1 runs prefork handler for library A, which blocks waiting to obtain AM, which is held by T2.

There's your deadlock and its unrelated to your own mutexes or code.

This actually happened on a project I once worked on. The advice I had found at that time was to choose fork or threads but not both. But for some applications that's probably not practical.