just curious to know which CPU architectures support compare and swap atomic primitives?
Powerpc has more powerful primitives available: "lwarx" and "stwcx"
lwarx loads a value from memory but remembers the location. Any other thread or cpu that touches that location will cause the "stwcx", a conditional store instruction, to fail.
So the lwarx /stwcx combo allows you to implement atomic increment / decrement, compare and swap, and more powerful atomic operations like "atomic increment circular buffer index"