Safe to have multiple processes writing to the same file at the same time? [CentOs 6, ext4]

Fixee picture Fixee · Oct 20, 2011 · Viewed 54.5k times · Source

I'm building a system where multiple slave processes are communicating via unix domain sockets, and they are writing to the same file at the same time. I have never studied filesystems or this specific filesystem (ext4), but it feels like there might be some danger here.

Each process writes to a disjoint subset of the output file (ie, there is no overlap in the blocks being written). For example, P1 writes to only the first 50% of the file and P2 writes only to the second 50%. Or perhaps P1 writes only the odd-numbered blocks while P2 writes the even-numbered blocks.

Is it safe to have P1 and P2 (running simultaneously on separate threads) writing to the same file without using any locking? In other words, does the filesystem impose some kind of locking implicitly?

Note: I'm unfortunately not at liberty to output multiple files and join them later.

Note: My reading since posting this question does not agree with the only posted answer below. Everything I've read suggests that what I want to do is fine, whereas the respondent below insists what I am doing is unsafe, but I am unable to discern the described danger.

Answer

janneb picture janneb · Oct 25, 2011

What you're doing seems perfectly OK, provided you're using the POSIX "raw" IO syscalls such as read(), write(), lseek() and so forth.

If you use C stdio (fread(), fwrite() and friends) or some other language runtime library which has its own userspace buffering, then the answer by "Tilo" is relevant, in that due to the buffering, which is to some extent outside your control, the different processes might overwrite each other's data.

Wrt OS locking, while POSIX states that writes or reads less than of size PIPE_BUF are atomic for some special files (pipes and FIFO's), there is no such guarantee for regular files. In practice, I think it's likely that IO's within a page are atomic, but there is no such guarantee. The OS only does locking internally to the extent that is necessary to protect its own internal data structures. One can use file locks, or some other interprocess communication mechanism, to serialize access to files. But, all this is relevant only of you have several processes doing IO to the same region of a file. In your case, as your processes are doing IO to disjoint sections of the file, none of this matters, and you should be fine.