How to use pthread_atfork() and pthread_once() to reinitialize mutexes in child processes

Blair Zajac picture Blair Zajac · Apr 12, 2010 · Viewed 12k times · Source

We have a C++ shared library that uses ZeroC's Ice library for RPC and unless we shut down Ice's runtime, we've observed child processes hanging on random mutexes. The Ice runtime starts threads, has many internal mutexes and keeps open file descriptors to servers.

Additionally, we have a few of mutexes of our own to protect our internal state.

Our shared library is used by hundreds of internal applications so we don't have control over when the process calls fork(), so we need a way to safely shutdown Ice and lock our mutexes while the process forks.

Reading the POSIX standard on pthread_atfork() on handling mutexes and internal state:

Alternatively, some libraries might have been able to supply just a child routine that reinitializes the mutexes in the library and all associated states to some known value (for example, what it was when the image was originally executed). This approach is not possible, though, because implementations are allowed to fail *_init() and *_destroy() calls for mutexes and locks if the mutex or lock is still locked. In this case, the child routine is not able to reinitialize the mutexes and locks.

On Linux, the this test C program returns EPERM from pthread_mutex_unlock() in the child pthread_atfork() handler. Linux requires adding _NP to the PTHREAD_MUTEX_ERRORCHECK macro for it to compile.

This program is linked from this good thread.

Given that it's technically not safe or legal to unlock or destroy a mutex in the child, I'm thinking it's better to have pointers to mutexes and then have the child make new pthread_mutex_t on the heap and leave the parent's mutexes alone, thereby having a small memory leak.

The only issue is how to reinitialize the state of the library and I'm thinking of reseting a pthread_once_t. Maybe because POSIX has an initializer for pthread_once_t that it can be reset to its initial state.

#include <pthread.h>
#include <stdlib.h>
#include <string.h>

static pthread_once_t once_control = PTHREAD_ONCE_INIT;

static pthread_mutex_t *mutex_ptr = 0;

static void
setup_new_mutex()
{
    mutex_ptr = malloc(sizeof(*mutex_ptr));
    pthread_mutex_init(mutex_ptr, 0);
}

static void
prepare()
{
    pthread_mutex_lock(mutex_ptr);
}

static void
parent()
{
    pthread_mutex_unlock(mutex_ptr);
}

static void
child()
{
    // Reset the once control.
    pthread_once_t once = PTHREAD_ONCE_INIT;
    memcpy(&once_control, &once, sizeof(once_control));
}

static void
init()
{
    setup_new_mutex();
    pthread_atfork(&prepare, &parent, &child);
}

int
my_library_call(int arg)
{
    pthread_once(&once_control, &init);

    pthread_mutex_lock(mutex_ptr);
    // Do something here that requires the lock.
    int result = 2*arg;
    pthread_mutex_unlock(mutex_ptr);

    return result;
}

In the above sample in the child() I only reset the pthread_once_t by making a copy of a fresh pthread_once_t initialized with PTHREAD_ONCE_INIT. A new pthread_mutex_t is only created when the library function is invoked in the child process.

This is hacky but maybe the best way of dealing with this skirting the standards. If the pthread_once_t contains a mutex then the system must have a way of initializing it from its PTHREAD_ONCE_INIT state. If it contains a pointer to a mutex allocated on the heap than it'll be forced to allocate a new one and set the address in the pthread_once_t. I'm hoping it doesn't use the address of the pthread_once_t for anything special which would defeat this.

Searching comp.programming.threads group for pthread_atfork() shows a lot of good discussion and how little the POSIX standards really provides to solve this problem.

There's also the issue that one should only call async-signal-safe functions from pthread_atfork() handlers, and it appears the most important one is the child handler, where only a memcpy() is done.

Does this work? Is there a better way of dealing with the requirements of our shared library?

Answer

Congratulations, you found a defect in the standard. pthread_atfork is fundamentally unable to solve the problem it was created to solve with mutexes, because the handler in the child is not permitted to perform any operations on them:

  • It cannot unlock them, because the caller would be the new main thread in the newly created child process, and that's not the same thread as the thread (in the parent) that obtained the lock.
  • It cannot destroy them, because they are locked.
  • It cannot re-initialize them, because they have not been destroyed.

One potential workaround is to use POSIX semaphores in place of mutexes here. A semaphore does not have an owner, so if the parent process locks it (sem_wait), both the parent and child processes can unlock (sem_post) their respective copies without invoking any undefined behavior.

As a nice aside, sem_post is async-signal-safe and thus definitely legal for the child to use.