I have been writing a program that spawns a child process, and calls waitpid
to wait for the termination of the child process. The code is below:
// fork & exec the child
pid_t pid = fork();
if (pid == -1)
// here is error handling code that is **not** triggered
if (!pid)
{
// binary_invocation is an array of the child process program and its arguments
execv(args.binary_invocation[0], (char * const*)args.binary_invocation);
// here is some error handling code that is **not** triggered
}
else
{
int status = 0;
pid_t res = waitpid(pid, &status, 0);
// here I see pid_t being a positive integer > 0
// and status being 11, which means WIFEXITED(status) is 0.
// this triggers a warning in my programs output.
}
The manpage of waitpid
states for WIFEXITED
:
WIFEXITED(status)
returns true if the child terminated normally, that is, by calling exit(3) or
_exit(2), or by returning from main().
Which I intepret to mean it should return an integer != 0 on success, which is not happening in the execution of my program, since I observe WIFEXITED(status) == 0
However, executing the same program from the command line results in $? == 0
, and starting from gdb results in:
[Inferior 1 (process 31934) exited normally]
The program behaves normally, except for the triggered warning, which makes me think something else is going on here, that I am missing.
EDIT:
as suggested below in the comments, I checked if the child is terminated via segfault, and indeed, WIFSIGNALED(status)
returns 1, and WTERMSIG(status)
returns 11, which is SIGSEGV
.
What I don't understand though, is why a call via execv would fail with a segfault while the same call via gdb, or a shell would succeed?
EDIT2:
The behaviour of my application heavily depends on the behaviour of the child process, in particular on a file the child writes in a function declared __attribute__ ((destructor))
. After the waitpid
call returns, this file exists and is generated correctly which means the segfault occurs somewhere in another destructor, or somewhere outside of my control.
On Unix and Linux systems, the status returned from wait
or waitpid
(or any of the other wait
variants) has this structure:
bits meaning
0-6 signal number that caused child to exit,
or 0177 if child stopped / continued
or zero if child exited without a signal
7 1 if core dumped, else 0
8-15 low 8 bits of value passed to _exit/exit or returned by main,
or signal that caused child to stop/continue
(Note that Posix doesn't define the bits, just macros, but these are the bit definitions used by at least Linux, Mac OS X/iOS, and Solaris. Also note that waitpid
only returns for stop events if you pass it the WUNTRACED
flag and for continue events if you pass it the WCONTINUED
flag.)
So a status of 11 means the child exited due to signal 11, which is SIGSEGV
(again, not Posix but conventionally).
Either your program is passing invalid arguments to execv
(which is a C library wrapper around execve
or some other kernel-specific call), or the child runs differently when you execv
it and when you run it from the shell or gdb.
If you are on a system that supports strace
, run your (parent) program under strace -f
to see whether execv
is causing the signal.