bool compare_exchange_weak (T& expected, T val, ..);
compare_exchange_weak()
is one of compare-exchange primitives provided in C++11. It's weak in the sense that it returns false even if the value of the object is equal to expected
. This is due to spurious failure on some platforms where a sequence of instructions (instead of one as on x86) are used to implement it. On such platforms, context switch, reloading of the same address (or cache line) by another thread, etc can fail the primitive. It's spurious
as it's not the value of the object (not equal to expected
) that fails the operation. Instead, it's kind of timing issues.
But what puzzles me is what's said in C++11 Standard (ISO/IEC 14882),
29.6.5 .. A consequence of spurious failure is that nearly all uses of weak compare-and-exchange will be in a loop.
Why does it have to be in a loop in nearly all uses ? Does that mean we shall loop when it fails because of spurious failures? If that's the case, why do we bother use compare_exchange_weak()
and write the loop ourselves? We can just use compare_exchange_strong()
which I think should get rid of spurious failures for us. What are the common use cases of compare_exchange_weak()
?
Another question related. In his book "C++ Concurrency In Action" Anthony says,
//Because compare_exchange_weak() can fail spuriously, it must typically
//be used in a loop:
bool expected=false;
extern atomic<bool> b; // set somewhere else
while(!b.compare_exchange_weak(expected,true) && !expected);
//In this case, you keep looping as long as expected is still false,
//indicating that the compare_exchange_weak() call failed spuriously.
Why is !expected
there in the loop condition? Does it there to prevent that all threads may starve and make no progress for some time?
Edit: (one last question)
On platforms that no single hardware CAS instruction exist, both the weak and strong version are implemented using LL/SC (like ARM, PowerPC, etc). So is there any difference between the following two loops? Why, if any? (To me, they should have similar performance.)
// use LL/SC (or CAS on x86) and ignore/loop on spurious failures
while (!compare_exchange_weak(..))
{ .. }
// use LL/SC (or CAS on x86) and ignore/loop on spurious failures
while (!compare_exchange_strong(..))
{ .. }
I come up w/ this last question you guys all mention that there maybe a performance difference inside a loop. It's also mentioned by the C++11 Standard (ISO/IEC 14882):
When a compare-and-exchange is in a loop, the weak version will yield better performance on some platforms.
But as analyzed above, two versions in a loop should give the same/similar performance. What's the thing I miss?
Usually, you want your work to be done before you move on, thus, you put compare_exchange_weak
into a loop so that it tries to exchange until it succeeds (i.e., returns true
).
Note that also compare_exchange_strong
is often used in a loop. It does not fail due to spurious failure, but it does fail due to concurrent writes.
weak
instead of strong
?Quite easy: Spurious failure does not happen often, so it is no big performance hit. In constrast, tolerating such a failure allows for a much more efficient implementation of the weak
version (in comparison to strong
) on some platforms: strong
must always check for spurious failure and mask it. This is expensive.
Thus, weak
is used because it is a lot faster than strong
on some platforms
weak
and when strong
?The reference states hints when to use weak
and when to use strong
:
When a compare-and-exchange is in a loop, the weak version will yield better performance on some platforms. When a weak compare-and-exchange would require a loop and a strong one would not, the strong one is preferable.
So the answer seems to be quite simple to remember: If you would have to introduce a loop only because of spurious failure, don't do it; use strong
. If you have a loop anyway, then use weak
.
!expected
in the exampleIt depends on the situation and its desired semantics, but usually it is not needed for correctness. Omitting it would yield a very similar semantics. Only in a case where another thread might reset the value to false
, the semantics could become slightly different (yet I cannot find a meaningful example where you would want that). See Tony D.'s comment for a detailed explanation.
It is simply a fast track when another thread writes true
: Then the we abort instead of trying to write true
again.
But as analyzed above, two versions in a loop should give the same/similar performance. What's the thing I miss?
From Wikipedia:
Real implementations of LL/SC do not always succeed if there are no concurrent updates to the memory location in question. Any exceptional events between the two operations, such as a context switch, another load-link, or even (on many platforms) another load or store operation, will cause the store-conditional to spuriously fail. Older implementations will fail if there are any updates broadcast over the memory bus.
So, LL/SC will fail spuriously on context switch, for example. Now, the strong version would bring its "own small loop" to detect that spurious failure and mask it by trying again. Note that this own loop is also more complicated than a usual CAS loop, since it must distinguish between spurious failure (and mask it) and failure due to concurrent access (which results in a return with value false
). The weak version does not have such own loop.
Since you provide an explicit loop in both examples, it is simply not necessary to have the small loop for the strong version. Consequently, in the example with the strong
version, the check for failure is done twice; once by compare_exchange_strong
(which is more complicated since it must distinguish spurious failure and concurrent acces) and once by your loop. This expensive check is unnecessary and the reason why weak
will be faster here.
Also note that your argument (LL/SC) is just one possibility to implement this. There are more platforms that have even different instruction sets. In addition (and more importantly), note that std::atomic
must support all operations for all possible data types, so even if you declare a ten million byte struct, you can use compare_exchange
on this. Even when on a CPU that does have CAS, you cannot CAS ten million bytes, so the compiler will generate other instructions (probably lock acquire, followed by a non-atomic compare and swap, followed by a lock release). Now, think of how many things can happen while swapping ten million bytes. So while a spurious error may be very rare for 8 byte exchanges, it might be more common in this case.
So in a nutshell, C++ gives you two semantics, a "best effort" one (weak
) and a "I will do it for sure, no matter how many bad things might happen inbetween" one (strong
). How these are implemented on various data types and platforms is a totally different topic. Don't tie your mental model to the implementation on your specific platform; the standard library is designed to work with more architectures than you might be aware of. The only general conclusion we can draw is that guaranteeing success is usually more difficult (and thus may require additional work) than just trying and leaving room for possible failure.