For any std::atomic<T>
where T is a primitive type:
If I use std::memory_order_acq_rel
for fetch_xxx
operations, and std::memory_order_acquire
for load
operation and std::memory_order_release
for store
operation blindly (I mean just like resetting the default memory ordering of those functions)
std::memory_order_seq_cst
(which is being used as default) for any of the declared operations?std::memory_order_seq_cst
in terms of efficiency?The C++11 memory ordering parameters for atomic operations specify constraints on the ordering. If you do a store with std::memory_order_release
, and a load from another thread reads the value with std::memory_order_acquire
then subsequent read operations from the second thread will see any values stored to any memory location by the first thread that were prior to the store-release, or a later store to any of those memory locations.
If both the store and subsequent load are std::memory_order_seq_cst
then the relationship between these two threads is the same. You need more threads to see the difference.
e.g. std::atomic<int>
variables x
and y
, both initially 0.
Thread 1:
x.store(1,std::memory_order_release);
Thread 2:
y.store(1,std::memory_order_release);
Thread 3:
int a=x.load(std::memory_order_acquire); // x before y
int b=y.load(std::memory_order_acquire);
Thread 4:
int c=y.load(std::memory_order_acquire); // y before x
int d=x.load(std::memory_order_acquire);
As written, there is no relationship between the stores to x
and y
, so it is quite possible to see a==1
, b==0
in thread 3, and c==1
and d==0
in thread 4.
If all the memory orderings are changed to std::memory_order_seq_cst
then this enforces an ordering between the stores to x
and y
. Consequently, if thread 3 sees a==1
and b==0
then that means the store to x
must be before the store to y
, so if thread 4 sees c==1
, meaning the store to y
has completed, then the store to x
must also have completed, so we must have d==1
.
In practice, then using std::memory_order_seq_cst
everywhere will add additional overhead to either loads or stores or both, depending on your compiler and processor architecture. e.g. a common technique for x86 processors is to use XCHG
instructions rather than MOV
instructions for std::memory_order_seq_cst
stores, in order to provide the necessary ordering guarantees, whereas for std::memory_order_release
a plain MOV
will suffice. On systems with more relaxed memory architectures the overhead may be greater, since plain loads and stores have fewer guarantees.
Memory ordering is hard. I devoted almost an entire chapter to it in my book.