What is the purpose of fork()?

kar picture kar · Jun 12, 2009 · Viewed 57.8k times · Source

In many programs and man pages of Linux, I have seen code using fork(). Why do we need to use fork() and what is its purpose?

Answer

Todd Gamblin picture Todd Gamblin · Jun 12, 2009

fork() is how you create new processes in Unix. When you call fork, you're creating a copy of your own process that has its own address space. This allows multiple tasks to run independently of one another as though they each had the full memory of the machine to themselves.

Here are some example usages of fork:

  1. Your shell uses fork to run the programs you invoke from the command line.
  2. Web servers like apache use fork to create multiple server processes, each of which handles requests in its own address space. If one dies or leaks memory, others are unaffected, so it functions as a mechanism for fault tolerance.
  3. Google Chrome uses fork to handle each page within a separate process. This will prevent client-side code on one page from bringing your whole browser down.
  4. fork is used to spawn processes in some parallel programs (like those written using MPI). Note this is different from using threads, which don't have their own address space and exist within a process.
  5. Scripting languages use fork indirectly to start child processes. For example, every time you use a command like subprocess.Popen in Python, you fork a child process and read its output. This enables programs to work together.

Typical usage of fork in a shell might look something like this:

int child_process_id = fork();
if (child_process_id) {
    // Fork returns a valid pid in the parent process.  Parent executes this.

    // wait for the child process to complete
    waitpid(child_process_id, ...);  // omitted extra args for brevity

    // child process finished!
} else {
    // Fork returns 0 in the child process.  Child executes this.

    // new argv array for the child process
    const char *argv[] = {"arg1", "arg2", "arg3", NULL};

    // now start executing some other program
    exec("/path/to/a/program", argv);
}

The shell spawns a child process using exec and waits for it to complete, then continues with its own execution. Note that you don't have to use fork this way. You can always spawn off lots of child processes, as a parallel program might do, and each might run a program concurrently. Basically, any time you're creating new processes in a Unix system, you're using fork(). For the Windows equivalent, take a look at CreateProcess.

If you want more examples and a longer explanation, Wikipedia has a decent summary. And here are some slides here on how processes, threads, and concurrency work in modern operating systems.