How to trace a process for system calls?

1der picture 1der · Jun 18, 2012 · Viewed 12k times · Source

I am trying to code a program that traces itself for system calls. I am having a difficult time making this work. I tried calling a fork() to create an instance of itself (the code), then monitor the resulting child process.

The goal is for the parent process to return the index of every system call made by the child process and output it to the screen. Somehow it is not working as planned.

Here is the code:

#include <unistd.h>     /* for read(), write(), close(), fork() */
#include <fcntl.h>      /* for open() */
#include <stdio.h>
#include <sys/ptrace.h>
#include <sys/reg.h>
#include <sys/wait.h>
#include <sys/types.h>


int main(int argc, char *argv[]) {
    pid_t child;
    long orig_eax;
    child = fork();

    if (0 == child) 
    {
        ptrace(PTRACE_TRACEME, 0, NULL, NULL);
        if (argc != 3) {
           fprintf(stderr, "Usage: copy <filefrom> <fileto>\n"); 
           return 1;
        }

        int c;
        size_t file1_fd, file2_fd; 
        if ((file1_fd = open(argv[1], O_RDONLY)) < 0) {
           fprintf(stderr, "copy: can't open %s\n", argv[1]);
           return 1;
        }

        if ((file2_fd = open(argv[2], O_WRONLY | O_CREAT)) < 0) {
            fprintf(stderr, "copy: can't open %s\n", argv[2]);
            return 1;
        }

        while (read(file1_fd, &c, 1) > 0) 
        write(file2_fd, &c, 1);
    }
    else
    {
        wait(NULL);
        orig_eax = ptrace (PTRACE_PEEKUSER, child, 4 * ORIG_EAX, NULL);
        printf("copy made a system call %ld\n", orig_eax);
        ptrace(PTRACE_CONT, child, NULL, NULL);
    }           
return 0;
}

This code was based on this code:

#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <linux/user.h>   /* For constants
                               ORIG_EAX etc */
int main()
{   
    pid_t child;
    long orig_eax;
    child = fork();
    if(child == 0) {
        ptrace(PTRACE_TRACEME, 0, NULL, NULL);
        execl("/bin/ls", "ls", NULL);
    }
    else {
        wait(NULL);
        orig_eax = ptrace(PTRACE_PEEKUSER,
                          child, 4 * ORIG_EAX,
                          NULL);
        printf("The child made a "
               "system call %ld\n", orig_eax);
        ptrace(PTRACE_CONT, child, NULL, NULL);
    }
    return 0;
}

The output of this one is:

The child made a system call 11

which is the index for the exec system call.

According to the man pages for wait():

All of these system calls are used to wait for state changes in a child
of the calling process, and obtain information about  the  child  whose
state  has changed. A state change is considered to be: the child terminated; 
the child was stopped by a signal; or the child was resumed by
a  signal.

The way I understand it is that every time a system call is invoked by a user program, the kernel will first inspect if the process is being traced prior to executing the system call routine and pauses that process with a signal and returns control to the parent. Wouldn't that be a state change already?

Answer

Chris Dodd picture Chris Dodd · Jun 19, 2012

The problem is that when the child calls ptrace(TRACEME) it sets itself up for tracing but doesn't actually stop -- it keeps going until it calls exec (in which case it stops with a SIGTRAP), or it gets some other signal. So in order for you to have the parent see what it does WITHOUT an exec call, you need to arrange for the child to receive a signal. The easiest way to do that is probably to have the child call raise(SIGCONT); (or any other signal) immediately after calling ptrace(TRACEME)

Now in the parent you just wait (once) and assume that the child is now stopped at a system call. This won't be the case if it stopped at a signal, so you instead need to call wait(&status) to get the child status and call WIFSTOPPED(status) and WSTOPSIG(status) to see WHY it has stopped. If it has stopped due to a syscall, the signal will be SIGTRAP.

If you want to see multiple system calls in the client, you'll need to do all of this in a loop; something like:

while(1) {
    wait(&status);
    if (WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP) {
        // stopped before or after a system call -- query the child and print out info
    }
    if (WIFEXITED(status) || WIFSIGNALED(status)) {
        // child has exited or terminated
        break;
    }
    ptrace(PTRACE_SYSCALL, 0, 0, 0);  // ignore any signal and continue the child
}

Note that it will stop TWICE for each system call -- once before the system call and a second time just after the system call completes.