How to timeout waitpid without killing the child?

matthias krull picture matthias krull · Aug 30, 2011 · Viewed 9.4k times · Source

I am aware of the many questions regarding waitpid and timeouts but they all cover this by killing the child from within an alarm handler.

That is not what i want, i want to keep the process running but dispatch it from waitpid.

The underlaying problem I try to solve is a daemon process with a main loop that processes a queue. The tasks are processed one at a time.

If a task hangs the whole main loop hangs. To get around this fork() and waitpid seemed an obvious choice. Still if a task hangs the loop hangs.

I can think of workarounds where i do not use waitpid at all but i would have to track running processes another way as i still want to process one task at a time in parallel to possibly hanging tasks.

I could even kill the task but i would like to have it running to examine what exactly is going wrong. A kill handler that dumps some debug information is also possible.

Anyway, the most convenient way to solve that issue is to timeout waitpid if possble.

Edit:

This is how I used fork() and waitpid and it may be clearer what is meant by child.

my $pid = fork();

if ($pid == 0){
    # i am the child and i dont want to die
}
elsif ($pid > 0) {
    waitpid $pid, 0;
    # i am the parent and i dont want to wait longer than $timeout
    # for the child to exit
}
else {
    die "Could not fork()";
}

Edit:

Using waitpid WNOHANG does what I want. Is this usage good practice or would you do it differently?

use strict;
use warnings;
use 5.012;
use POSIX ':sys_wait_h';

my $pid = fork();

if ($pid == 0){
    say "child will sleep";
    sleep 20;
    say "child slept";
}
else {
    my $time = 10;
    my $status;
    do {
        sleep 1;
        $status = waitpid -1, WNOHANG;
        $time--;
    } while ($time && not $status );

    say "bye";
}

Answer

Blagovest Buyukliev picture Blagovest Buyukliev · Aug 30, 2011

If a task hangs the whole main loop hangs. To get around this fork() and waitpid seemed an obvious choice. Still if a task hangs the loop hangs.

Use waitpid with the WNOHANG option. This way it's not going to suspend the parent process and will immediately return 0 when the child has not yet exited. In your main loop you'll have to periodically poll all the children (tasks).