waitpid - WIFEXITED returning 0 although child exited normally

Andreas Grapentin picture Andreas Grapentin · Apr 24, 2014 · Viewed 24.6k times · Source

I have been writing a program that spawns a child process, and calls waitpid to wait for the termination of the child process. The code is below:

  // fork & exec the child
  pid_t pid = fork();
  if (pid == -1)
    // here is error handling code that is **not** triggered

  if (!pid)
    {
      // binary_invocation is an array of the child process program and its arguments
      execv(args.binary_invocation[0], (char * const*)args.binary_invocation);
      // here is some error handling code that is **not** triggered
    }
  else
    {
      int status = 0;
      pid_t res = waitpid(pid, &status, 0);

      // here I see pid_t being a positive integer > 0
      // and status being 11, which means WIFEXITED(status) is 0.
      // this triggers a warning in my programs output.
    }

The manpage of waitpid states for WIFEXITED:

WIFEXITED(status)
    returns  true  if  the child terminated normally, that is, by calling exit(3) or
    _exit(2), or by returning from main().

Which I intepret to mean it should return an integer != 0 on success, which is not happening in the execution of my program, since I observe WIFEXITED(status) == 0

However, executing the same program from the command line results in $? == 0, and starting from gdb results in:

[Inferior 1 (process 31934) exited normally]

The program behaves normally, except for the triggered warning, which makes me think something else is going on here, that I am missing.

EDIT:
as suggested below in the comments, I checked if the child is terminated via segfault, and indeed, WIFSIGNALED(status) returns 1, and WTERMSIG(status) returns 11, which is SIGSEGV.

What I don't understand though, is why a call via execv would fail with a segfault while the same call via gdb, or a shell would succeed?

EDIT2:
The behaviour of my application heavily depends on the behaviour of the child process, in particular on a file the child writes in a function declared __attribute__ ((destructor)). After the waitpid call returns, this file exists and is generated correctly which means the segfault occurs somewhere in another destructor, or somewhere outside of my control.

Answer

rob mayoff picture rob mayoff · Apr 24, 2014

On Unix and Linux systems, the status returned from wait or waitpid (or any of the other wait variants) has this structure:

bits   meaning

0-6    signal number that caused child to exit,
       or 0177 if child stopped / continued
       or zero if child exited without a signal

 7     1 if core dumped, else 0

8-15   low 8 bits of value passed to _exit/exit or returned by main,
       or signal that caused child to stop/continue

(Note that Posix doesn't define the bits, just macros, but these are the bit definitions used by at least Linux, Mac OS X/iOS, and Solaris. Also note that waitpid only returns for stop events if you pass it the WUNTRACED flag and for continue events if you pass it the WCONTINUED flag.)

So a status of 11 means the child exited due to signal 11, which is SIGSEGV (again, not Posix but conventionally).

Either your program is passing invalid arguments to execv (which is a C library wrapper around execve or some other kernel-specific call), or the child runs differently when you execv it and when you run it from the shell or gdb.

If you are on a system that supports strace, run your (parent) program under strace -f to see whether execv is causing the signal.