c++ multiple processes writing to the same file - Interprocess mutex?

code_fodder picture code_fodder · Mar 20, 2018 · Viewed 7k times · Source

My question is this: what is the best way (or at least an effective way) to write to a file from multiple processes?

Note: I am using c++11 and I want this to run on any platform (i.e. pure c++ code only).

I have done some research and here is what I have concluded:

  1. In my processes I have multiple threads. This is easily handled within each process using a mutex to serialise access to the file.
  2. A c++/c++11 mutex or conditional variable cannot be used to serialise between processes.
  3. I need some sort of external semaphore / lock file to act as a "mutex"... but I am not sure how to go about doing this.

I have seen applications use things like creating a ".lock" file when in use. But for multiple rapid access it seems like this may not work (i.e. after one process has decided the file does not exist another could create it and then the first process will also try to create it) because the operation to test and create the file is not atomic.

Note: Each process always writes one entire line at a time. I had thought that this might be enough to make the operation "atomic" (in that a whole line would get buffered before the next one), but this does not appear to be the case (unless I have my code wrong) since I (rarely) get a mangled line. Here is a code snippet of how I am doing a write (in case it is relevant):

// in c'tor
m_osFile.open("test.txt", std::fstream::out | std::fstream::app)

// in write func (std::string data)
osFile << data<< std::endl;

This must be a common-ish issue, but I have not yet found a workable solution to it. Any code snippets would be welcome.

Answer

Sigi picture Sigi · Mar 20, 2018

My question is this: what is the best way (or at least an effective way) to write to a file from multiple processes?

The best way is... don't do it!

This really seems a sort of log (appending). I would just let every process write its own file and then merge them when needed. This is the common approach at least, and here it is the rationale.

Any kind of intra-process locking is not going to work. Open files have buffering at OS level, even after being closed on some OSes (windows).

You cannot perform file locking, if you want a portable solution ("I want this to run on any platform"): you are going to meet even possible performance penalties/undefined behavior depending on the filesystem being used (eg: samba, NFS).

Writing concurrently and reliably to a single file is in fact a system-dependent activity, today.

I don't mean that it is not possible - DB engines and other applications do it reliably, but it's a customized operation.

As a good alternative, you can let one process act as a collector (as proposed by Gem Taylor), all the rest as producers, but this is not going to be a reliable alternative: logs need to get to disk "simply": if a bug can let the logs not to be written, the log purpose is going to be lost.

However you can think to use this approach, decoupling the processes and letting the messages between them to be exchanged reliably and efficiently: if this is the case you can think to use a messaging solution like RabbitMQ.

In this case all the processes publish their "lines" to the message broker, and one more process consumes such messages and write them to file.