Return code when OOM killer kills a process

betabandido picture betabandido · Aug 24, 2011 · Viewed 11.4k times · Source

I am running a multiprogrammed workload (based on SPEC CPU2006 benchmarks) on a POWER7 system using SUSE SLES 11.

Sometimes, each application in the workload consumes a significant amount of memory and the total memory footprint exceeds the available memory installed in the system (32 GB).

I disabled the swap since otherwise the measurements could be heavily affected for the processes using the swap. I know that by doing that the kernel, through the OOM killer, may kill some of the processes. That is totally fine. The problem is that I would expect that a thread killed by the kernel exited with an error condition (e.g., the process was terminated by a signal).

I have a framework that launches all the processes and then waits for them using

waitpid(pid, &status, 0);

Even if a thread is killed by the OOM killer (I know that since I get a message in the screen and in /var/log/messages), the call

WIFEXITED(status);

returns one, and the call

WEXITSTATUS(status);

returns zero. Therefore, I am not able to distinguish when a process finishes correctly and when it is killed by the OOM killer.

Am I doing anything wrong? Do you know any way to detect when a process has been killed by the OOM killer.

I found this post asking pretty much the same question. However, since it is an old post and answers were not satisfactory, I decided to post a new question.

Answer

cnicutar picture cnicutar · Aug 24, 2011

The Linux OOM killer works by sending SIGKILL. If your process is killed by the OOM it's fishy that WIFEXITED returns 1.

TLPI

To kill the selected process, the OOM killer delivers a SIGKILL signal.

So you should be able to test this using:

if (WIFSIGNALED(status)) {
    if (WTERMSIG(status) == SIGKILL)
        printf("Killed by SIGKILL\n");
}