Relative performance of swap vs compare-and-swap locks on x86

R.. GitHub STOP HELPING ICE picture R.. GitHub STOP HELPING ICE · Mar 17, 2011 · Viewed 6.9k times · Source

Two common locking idioms are:

if (!atomic_swap(lockaddr, 1)) /* got the lock */

and:

if (!atomic_compare_and_swap(lockaddr, 0, val)) /* got the lock */

where val could simply be a constant or an identifier for the new prospective owner of the lock.

What I'd like to know is whether there tends to be any significant performance difference between the two on x86 (and x86_64) machines. I know this is a fairly broad question since the answer might vary a lot between individual cpu models, but that's part of the reason I'm asking SO rather than just doing benchmarks on a few cpus I have access to.

Answer

Gunther Piez picture Gunther Piez · Mar 17, 2011

I assume atomic_swap(lockaddr, 1) gets translated to a xchg reg,mem instruction and atomic_compare_and_swap(lockaddr, 0, val) gets translated to a cmpxchg[8b|16b].

Some linux kernel developers think cmpxchg ist faster, because the lock prefix isn't implied as with xchg. So if you are on a uniprocessor, multithread or can otherwise make sure the lock isn't needed, you are probably better of with cmpxchg.

But chances are your compiler will translate it to a "lock cmpxchg" and in that case it doesn't really matter. Also note that while latencies for this instructions are low (1 cycle without lock and about 20 with lock), if you happen to use are common sync variable between two threads, which is quite usual, some additional bus cycles will be enforced, which last forever compared to the instruction latencies. These will most likely completly be hidden by a 200 or 500 cpu cycles long cache snoop/sync/mem access/bus lock/whatever.