Raymond Chen has been doing a huge series on lockfree algorithms. Beyond the simple cases of the InterlockedXxx
functions, it seems like the prevailing pattern with all of these is that they implement their own locks. Sure, there are not processor locks, but the concept of looping over and over on each CPU to ensure consistency is very much like a spinlock. And being a spinlock, they are going to be less efficient than the general locks that come with the operating system because they do not yield control of their quanta while waiting for other threads. Therefore, whenever someone comes to me and says "but my algorithm is lock-free", my general response is "so"?
I'm curious -- are there benchmarks available which show lock free algorithms to have an edge over their lock-full counterparts?
In general, lock free algorithms are less efficient per thread - you're doing more work, as you mentioned, in order to implement a lock free algorithm than a simple lock.
However, they do tend to dramatically improve the overall throughput of the algorithm as a whole in the face of contention. Thread switching latency and context switches, which fast, over many threads slow down the throughput of your application dramatically. Lock free algorithms effectively are implementing their own "locks," but they do so in a manner that prevents or reduces the number of context switches, which is why they tend to out perform their locking counterparts.
That being said - most of this depends on the algorithm (and implementation) in question. For example, I've got some routines that I managed to switch to .NET 4's new concurrent collections instead of using the previous locking mechanisms, and have measured improvements over near 30% in my total algorithm speed. That being said, there are many benchmarks you can find that show reduced performance using some of these same collections when compared to a basic lock. As with all performance optimizations - you really don't know until you measure.