Linux system call for creating process and thread

atoMerz picture atoMerz · Feb 28, 2012 · Viewed 20.9k times · Source

I read in a paper that the underlying system call to create processes and threads is actually the same, and thus the cost of creating processes over threads is not that great.

  • First, I wanna know what is the system call that creates processes/threads (possibly a sample code or a link?)
  • Second, is the author correct to assume that creating processes instead of threads is inexpensive?

EDIT:
Quoting article:

Replacing pthreads with processes is surprisingly inexpensive, especially on Linux where both pthreads and processes are invoked using the same underlying system call.

Answer

Damon picture Damon · Feb 28, 2012

Processes are usually created with fork, threads (lightweight processes) are usually created with clone nowadays. However, anecdotically, there exist 1:N thread models, too, which don't do either.

Both fork and clone map to the same kernel function do_fork internally. This function can create a lightweight process that shares the address space with the old one, or a separate process (and many other options), depending on what flags you feed to it. The clone syscall is more or less a direct forwarding of that kernel function (and used by the higher level threading libraries) whereas fork wraps do_fork into the functionality of the 50 year old traditional Unix function.

The important difference is that fork guarantees that a complete, separate copy of the address space is made. This, as Basil points out correctly, is done with copy-on-write nowadays and therefore is not nearly as expensive as one would think.
When you create a thread, it just reuses the original address space and the same memory.

However, one should not assume that creating processes is generally "lightweight" on unix-like systems because of copy-on-write. It is somewhat less heavy than for example under Windows, but it's nowhere near free.
One reason is that although the actual pages are not copied, the new process still needs a copy of the page table. This can be several kilobytes to megabytes of memory for processes that use larger amounts of memory. Another reason is that although copy-on-write is invisible and a clever optimization, it is not free, and it cannot do magic. When data is modified by either process, which inevitably happens, the affected pages fault.

Redis is a good example where you can see that fork is everything but lightweight (it uses fork to do background saves).