How to do proper file locking on NFS?

Ant picture Ant · Feb 13, 2018 · Viewed 9.4k times · Source

I am trying to implement a "record manager" class in python 3x and linux/macOS. The class is relatively easy and straightforward, the only "hard" thing I want is to be able to access the same file (where results are saved) on multiple processes.

This seemed pretty easy, conceptually: when saving, acquire an exclusive lock on the file. Update your information, save the new information, release exclusive lock on the file. Easy enough.

I am using fcntl.lockf(file, fcntl.LOCK_EX) to acquire the exclusive lock. The problem is that, looking on the internet, I am finding a lot of different websites saying how this is not reliable, that it won't work on windows, that the support on NFS is shaky, and that things could change between macOS and linux.

I have accepted that the code won't work on windows, but I was hoping to be able to make it work on macOS (single machine) and on linux (on multiple servers with NFS).

The problem is that I can't seem to make this work; and after a while of debugging and after the tests passed on macOS, they failed once I tried them on the NFS with linux (ubuntu 16.04). The issue is an inconsistency between the informations saved by multiple processes - some processes have their modifications missing, which means something went wrong in the locking and saving procedure.

I am sure there is something I am doing wrong, and I suspect this may be related to the issues that I read about online. So, what is the proper way to deal multiple access to the same file that works on macOS and linux over NFS?

Edit

This is what the typical method that writes new informations to disk looks like:

sf = open(self._save_file_path, 'rb+')
try:
    fcntl.lockf(sf, fcntl.LOCK_EX)  # acquire an exclusive lock - only one writer
    self._raw_update(sf) #updates the records from file (other processes may have modified it)
    self._saved_records[name] = new_info
    self._raw_save() #does not check for locks (but does *not* release the lock on self._save_file_path)
finally:
    sf.flush()
    os.fsync(sf.fileno()) #forcing the OS to write to disk
    sf.close() #release the lock and close

While this is how a typical method that only read info from disk looks like:

sf = open(self._save_file_path, 'rb')
try:
    fcntl.lockf(sf, fcntl.LOCK_SH)  # acquire shared lock - multiple writers
    self._raw_update(sf) #updates the records from file (other processes may have modified it)
    return self._saved_records
finally:
    sf.close() #release the lock and close

Also, this is how _raw_save looks like:

def _raw_save(self):
    #write to temp file first to avoid accidental corruption of information. 
    #os.replace is guaranteed to be an atomic operation in POSIX
    with open('temp_file', 'wb') as p:
        p.write(self._saved_records)
    os.replace('temp_file', self._save_file_path) #pretty sure this does not release the lock

Error message

I have written a unit test where I create 100 different processes, 50 that read and 50 that write to the same file. Each process does some random waiting to avoid accessing the files sequentially.

The problem is that some of the records are not kept; at the end there are some 3-4 random records missing, so I only end up with 46-47 records rather than 50.

Edit 2

I have modified the code above and I acquire the lock not on the file itself, but on a separate lock file. This prevents the issue that closing the file would release the lock (as suggested by @janneb), and makes the code work correctly on mac. The same code fails on linux with NFS though.

Answer

janneb picture janneb · Feb 19, 2018

I don't see how the combination of file locks and os.replace() can make sense. When the file is replaced (that is, the directory entry is replaced), all the existing file locks (probably including file locks waiting for the locking to succeed, I'm not sure of the semantics here) and file descriptors will be against the old file, not the new one. I suspect this is the reason behind the race conditions causing you to lose some of the records in your tests.

os.replace() is a good technique to ensure that a reader doesn't read a partial update. But it doesn't work robustly in the face of multiple updaters (unless losing some of the updates is ok).

Another issues is that fcntl is a really really stupid API. In particular, the locks are bound to the process, not the file descriptor. Which means that e.g. a close() on ANY file descriptor pointing to the file will release the lock.

One way would be to use a "lock file", e.g. taking advantage of the atomicity of link(). From http://man7.org/linux/man-pages/man2/open.2.html:

Portable programs that want to perform atomic file locking using a lockfile, and need to avoid reliance on NFS support for O_EXCL, can create a unique file on the same filesystem (e.g., incorporating hostname and PID), and use link(2) to make a link to the lockfile. If link(2) returns 0, the lock is successful. Otherwise, use stat(2) on the unique file to check if its link count has increased to 2, in which case the lock is also successful.

If it's Ok to read slightly stale data then you can use this link() dance only for a temp file that you use when updating the file and then os.replace() the "main" file you use for reading (reading can then be lockless). If not, then you need to do the link() trick for the "main" file and forget about shared/exclusive locking, all locks are then exclusive.

Addendum: One tricky thing to deal with when using lock files is what to do when a process dies for whatever reason, and leaves the lock file around. If this is to run unattended, you might want to incorporate some kind of timeout and removal of lock files (e.g. check the stat() timestamps).