I'm working on something that uses ByteBuffers built from memory-mapped files (via FileChannel.map()) as well as in-memory direct ByteBuffers. I am trying to understand the concurrency and memory model constraints.
I have read all of the relevant Javadoc (and source) for things like FileChannel, ByteBuffer, MappedByteBuffer, etc. It seems clear that a particular ByteBuffer (and relevant subclasses) has a bunch of fields and the state is not protected from a memory model point of view. So, you must synchronize when modifying state of a particular ByteBuffer if that buffer is used across threads. Common tricks include using a ThreadLocal to wrap the ByteBuffer, duplicate (while synchronized) to get a new instance pointing to the same mapped bytes, etc.
Given this scenario:
B_all
for the entire file (say it's <2gb)B_1
that a chunk of the file and gives this to thread T1B_2
pointing to the same mapped bytes and gives this to thread T2My question is: Can T1 write to B_1 and T2 write to B_2 concurrently and be guaranteed to see each other's changes? Could T3 use B_all to read those bytes and be guaranteed to see the changes from both T1 and T2?
I am aware that writes in a mapped file are not necessarily seen across processes unless you use force() to instruct the OS to write the pages down to disk. I don't care about that. Assume for this question that this JVM is the only process writing a single mapped file.
Note: I am not looking for guesses (I can make those quite well myself). I would like references to something definitive about what is (or is not) guaranteed for memory-mapped direct buffers. Or if you have actual experiences or negative test cases, that could also serve as sufficient evidence.
Update: I have done some tests with having multiple threads write to the same file in parallel and so far it seems those writes are immediately visible from other threads. I'm not sure if I can rely on that though.
Memory mapping with the JVM is just a thin wrapper around CreateFileMapping (Windows) or mmap (posix). As such, you have direct access to the buffer cache of the OS. This means that these buffers are what the OS considers the file to contain (and the OS will eventually synch the file to reflect this).
So there is no need to call force() to sync between processes. The processes are already synched (via the OS - even read/write accesses the same pages). Forcing just synchs between the OS and the drive controller (there can be some delay between the drive controller and the physical platters, but you don't have hardware support to do anything about that).
Regardless, memory mapped files are an accepted form of shared memory between threads and/or processes. The only difference between this shared memory and, say, a named block of virtual memory in Windows is the eventual synchronization to disk (in fact mmap does the virtual memory without a file thing by mapping /dev/null).
Reading writing memory from multiple processes/threads does still need some synch, as processors are able to do out-of-order execution (not sure how much this interacts with JVMs, but you can't make presumptions), but writing a byte from one thread will have the same guarantees as writing to any byte in the heap normally. Once you have written to it, every thread, and every process, will see the update (even through an open/read operation).
For more info, look up mmap in posix (or CreateFileMapping for Windows, which was built almost the same way.