I am running a multiprogrammed workload (based on SPEC CPU2006 benchmarks) on a POWER7 system using SUSE SLES 11.
Sometimes, each application in the workload consumes a significant amount of memory and the total memory footprint exceeds the available memory installed in the system (32 GB).
I disabled the swap since otherwise the measurements could be heavily affected for the processes using the swap. I know that by doing that the kernel, through the OOM killer, may kill some of the processes. That is totally fine. The problem is that I would expect that a thread killed by the kernel exited with an error condition (e.g., the process was terminated by a signal).
I have a framework that launches all the processes and then waits for them using
waitpid(pid, &status, 0);
Even if a thread is killed by the OOM killer (I know that since I get a message in the screen and in /var/log/messages), the call
WIFEXITED(status);
returns one, and the call
WEXITSTATUS(status);
returns zero. Therefore, I am not able to distinguish when a process finishes correctly and when it is killed by the OOM killer.
Am I doing anything wrong? Do you know any way to detect when a process has been killed by the OOM killer.
I found this post asking pretty much the same question. However, since it is an old post and answers were not satisfactory, I decided to post a new question.
The Linux OOM killer works by sending SIGKILL
. If your process is killed by the OOM it's fishy that WIFEXITED
returns 1.
TLPI
To kill the selected process, the OOM killer delivers a SIGKILL signal.
So you should be able to test this using:
if (WIFSIGNALED(status)) {
if (WTERMSIG(status) == SIGKILL)
printf("Killed by SIGKILL\n");
}