Is synchronizing with `std::mutex` slower than with `std::atomic(memory_order_seq_cst)`?

jaybny picture jaybny · Apr 30, 2013 · Viewed 16.1k times · Source

The main reason for using atomics over mutexes, is that mutexes are expensive but with the default memory model for atomics being memory_order_seq_cst, isn't this just as expensive?

Question: Can concurrent a program using locks be as fast as concurrent lock-free program?

If so, it may not be worth the effort unless I want to use memory_order_acq_rel for atomics.


Edit: I may be missing something but lock-based cant be faster than lock-free because each lock will have to be a full memory barrier too. But with lock-free, it's possible to use techniques that are less restrictive then memory barriers.

So back to my question, is lock-free any faster than lock based in new C++11 standard with default memory_model?

Is "lock-free >= lock-based when measured in performance" true? Let's assume 2 hardware threads.


Edit 2: My question is not about progress guarantees, and maybe I'm using "lock-free" out of context.

Basically when you have 2 threads with shared memory, and the only guarantee you need is that if one thread is writing then the other thread can't read or write, my assumption is that a simple atomic compare_and_swap operation would be much faster than locking a mutex.

Because if one thread never even touches the shared memory, you will end up locking and unlocking over and over for no reason but with atomic operations you only use 1 CPU cycle each time.

In regards to the comments, a spin-lock vs a mutex-lock is very different when there is very little contention.

Answer

Kerrek SB picture Kerrek SB · Apr 30, 2013

Lockfree programming is about progress guarantees: From strongest to weakest, those are wait-free, lock-free, obstruction-free, and blocking.

A guarantee is expensive and comes at a price. The more guarantees you want, the more you pay. Generally, a blocking algorithm or datastructure (with a mutex, say) has the greatest liberties, and thus is potentially the fastest. A wait-free algorithm on the other extreme must use atomic operations at every step, which may be much slower.

Obtaining a lock is actually rather cheap, so you should never worry about that without a deep understanding of the subject. Moreover, blocking algorithms with mutexes are much easier to read, write and reason about. By contrast, even the simplest lock-free data structures are the result of long, focused research, each of them worth one or more PhDs.

In a nutshell, lock- or wait-free algorithms trade worst latency for mean latency and throughput. Everything is slower, but nothing is ever very slow. This is a very special characteristic that is only useful in very specific situations (like real-time systems).