The Linux kernel uses lock; addl $0,0(%%esp)
as write barrier, while the RE2 library uses xchgl (%0),%0
as write barrier. What's the difference and which is better?
Does x86 also require read barrier instructions? RE2 defines its read barrier function as a no-op on x86 while Linux defines it as either lfence
or no-op depending on whether SSE2 is available. When is lfence
required?
Quoting from the IA32 manuals (Vol 3A, Chapter 8.2: Memory Ordering):
In a single-processor system for memory regions defined as write-back cacheable, the memory-ordering model respects the following principles [..]
- Reads are not reordered with other reads
- Writes are not reordered with older reads
- Writes to memory are not reordered with other writes, with the exception of
- writes executed with the
CLFLUSH
instruction- streaming stores (writes) executed with the non-temporal move instructions ([list of instructions here])
- string operations (see Section 8.2.4.1)
- Reads may be reordered with older writes to different locations but not with older writes to the same location.
- Reads or writes cannot be reordered with I/O instructions, locked instructions, or serializing instructions
- Reads cannot pass
LFENCE
andMFENCE
instructions- Writes cannot pass
SFENCE
andMFENCE
instructions
Note: The "In a single-processor system" above is slightly misleading. The same rules hold for each (logical) processor individually; the manual then goes on to describe the additional ordering rules between multiple processors. The only bit about it pertaining to the question is that
- Locked instructions have a total order.
In short, as long as you're writing to write-back memory (which is all memory you'll ever see as long as you're not a driver or graphics programmer), most x86 instructions are almost sequentially consistent - the only reordering an x86 CPU can perform is reorder later (independent) reads to execute before writes. The main thing about the write barriers is that they have a lock
prefix (implicit or explicit), which forbids all reordering and ensures that the operations is seen in the same order by all processors in a multi-processor system.
Also, in write-back memory, reads are never reordered, so there's no need for read barriers. Recent x86 processors have a weaker memory consistency model for streaming stores and write-combined memory (commonly used for mapped graphics memory). That's where the various fence
instructions come into play; they're not necessary for any other memory type, but some drivers in the Linux kernel do deal with write-combined memory so they just defined their read-barrier that way. The list of ordering model per memory type is in Section 11.3.1 in Vol. 3A of the IA-32 manuals. Short version: Write-Through, Write-Back and Write-Protected allow speculative reads (following the rules as detailed above), Uncachable and Strong Uncacheable memory has strong ordering guarantees (no processor reordering, reads/writes are immediately executed, used for MMIO) and Write Combined memory has weak ordering (i.e. relaxed ordering rules that need fences).