C++11 memory_order_acquire and memory_order_release semantics?

Cedomir Segulja picture Cedomir Segulja · Apr 24, 2013 · Viewed 9.9k times · Source

http://en.cppreference.com/w/cpp/atomic/memory_order, and other C++11 online references, define memory_order_acquire and memory_order_release as:

  • Acquire operation: no reads in the current thread can be reordered before this load.
  • Release operation: no writes in the current thread can be reordered after this store.

This seems to allow post-acquire-writes to be executed before the acquire operation, which seems weird too me (usual acquire/release operation semantics restrict movement of all memory operations).

Same online source (http://en.cppreference.com/w/cpp/atomic/atomic_flag) suggests that a spinlock mutex can be built using C++ atomics and the above mentioned relaxed memory ordering rules:

lock mutex: while (lock.test_and_set(std::memory_order_acquire))

unlock mutex: lock.clear(std::memory_order_release);               

With this definition of lock/unlock, wouldn't the simple code below be broken if memory_order_acquire/release are indeed defined this way (i.e., not forbidding reordering of post-acquire-writes):

Thread1:
  (0) lock
    (1) x = 1;
    (2) if (x != 1) PANIC
  (3) unlock

Thread2:
  (4) lock
    (5) x = 0;
  (6) unlock

Is the following execution possible: (0) lock, (1) x = 1, (5) x = 0, (2) PANIC ? What did I miss?

Answer

Wandering Logic picture Wandering Logic · Apr 24, 2013

The spinlock mutex implementation looks okay to me. I think they got the definitions of acquire and release completely wrong.

Here is the clearest explanation of acquire/release consistency models that I am aware of: Gharachorloo; Lenoski; Laudon; Gibbons; Gupta; Hennessy: Memory consistency and event ordering in scalable shared-memory multiprocessors, Int'l Symp Comp Arch, ISCA(17):15-26, 1990, doi 10.1145/325096.325102. (The doi is behind the ACM paywall. The actual link is to a copy not behind a paywall.)

Look at Condition 3.1 in Section 3.3 and the accompanying Figure 3:

  • before an ordinary load or store access is allowed to perform with respect to any other processor, all previous acquire accesses must be performed, and
  • before a release access is allowed to perform with respect to any other processor, all previous ordinary load and store accesses must be performed, and
  • special accesses are [sequentially] consistent with respect to one another.

The point is this: acquires and releases are sequentially consistent1 (all threads globally agree on the order in which acquires and releases happened.) All threads globally agree that the stuff that happens between an acquire and a release on a specific thread happened between the acquire and release. But normal loads and stores after a release are allowed to be moved (either by hardware or the compiler) above the release, and normal loads and stores before an acquire are allowed to be moved (either by hardware or the compiler) to after the acquire.

(Footnote 1: This is true for most implementations, but an overstatement for ISO C++ in general. Reader threads are allowed to disagree about the order of 2 stores done by 2 other threads. See Acquire/release semantics with 4 threads, and this answer for details of how C++ compiled for POWER CPUs demonstrates the difference in practice with release and acquire, but not seq_cst. But most CPUs do only get data between cores via coherent cache that means a global order does exist.)


In the C++ standard (I used the link to the Jan 2012 draft) the relevant section is 1.10 (pages 11 through 14).

The definition of happens-before is intended to be modeled after Lamport; Time, Clocks, and the Ordering of Events in a Distributed System, CACM, 21(7):558-565, Jul 1978. C++ acquires correspond to Lamport's receives, C++ releases correspond to Lamport's sends. Lamport placed a total order on the sequence of events within a single thread, where C++ has to allow a partial order (see Section 1.9, Paragraphs 13-15, page 10 for the C++ definition of sequenced-before.) Still, the sequenced-before ordering is pretty much what you would expect. Statements are sequenced in the order they are given in the program. Section 1.9, paragraph 14: "Every value computation and side effect associated with a full-expression is sequenced before every value computation and side effect associated with the next full-expression to be evaluated."

The whole point of Section 1.10 is to say that a program that is data-race-free produces the same well defined value as if the program were run on a machine with a sequentially consistent memory and no compiler reordering. If there is a data race then the program has no defined semantics at all. If there is no data race then the compiler (or machine) is permitted to reorder operations that don't contribute to the illusion of sequential consistency.

Section 1.10, Paragraph 21 (page 14) says: A program is not data-race-free if there is a pair of accesses A and B from different threads to object X, at least one of those accesses has a side effect, and neither A happens-before B, nor B happens-before A. Otherwise the program is data-race-free.

Paragraphs 6-20 give a very careful definition of the happens-before relation. The key definition is Paragraph 12:

"An evaluation A happens before an evaluation B if:

  • A is sequenced before B, or
  • A inter-thread happens before B."

So if an acquire is sequenced before (in the same thread) pretty much any other statement, then the acquire must appear to happen before that statement. (Including if that statement performs a write.)

Likewise: if pretty much any statement is sequenced before (in the same thread) a release, then that statement must appear to happen before the release. (Including if that statement just does a value computation (read).)

The reason that the compiler is allowed to move other computations from after a release to before a release (or from before an acquire to after an acquire) is because of the fact that those operations specifically do not have an inter-thread happens before relationship (because they are outside the critical section). If they race the semantics are undefined, and if they don't race (because they aren't shared) then you can't tell exactly when they happened with regard to the synchronization.

Which is a very long way of saying: cppreference.com's definitions of acquire and release are dead wrong. Your example program has no data race condition, and PANIC can not occur.