Assembly: does xadd instruction need lock?

becks picture becks · May 8, 2015 · Viewed 7.4k times · Source

I'm reading smphello.s code by allan cruse code

in the following part he's trying to setup stack segment for each processor.

the point is that he used xadd without using lock prefix while in the description of xadd as in here . there may be a lock prefix.

is this a bug or is it okay ? and why ?

# setup an exclusive stack-area for this processor
mov  $0x1000, %ax   # paragraphs in segment
xadd %ax, newSS     # 'atomic' xchg-and-add
mov  %ax, %ss       # segment-address in SS
xor  %esp, %esp     # top-of-stack into ESP

Answer

Peter Cordes picture Peter Cordes · Nov 28, 2020

xadd without lock is atomic wrt. interrupts on this core, but not wrt. code running on other cores (or DMA). Just like all other memory-destination RMWs other than xchg. See this answer which covers the same question for cmpxchg.

If this code runs simultaneously on multiple cores, 2 or more cores could read the same value for newSS, effectively losing an increment and assigning the same ss:esp to both cores. Or one store could be delayed across multiple xadd's by other cores that happen to be sequential, "rewinding" the counter to some previous value seen by later loads. Or any combination of problems. Can num++ be atomic for 'int num'?

Assuming newSS is aligned, the load and store are separately atomic, but do not form an atomic RMW.

If multiple cores are woken simultaneously (are broadcast IPIs possible?) it seems very possible for this to be a real problem. If not, it's likely that each xadd would finish before the next core gets to this code. (Including the store committing to L1d cache; becoming globally visible.) But that's just a "happens to work" behaviour unless the core-wakeup function waits to hear back from one core before waking another.

It definitely needs lock xadd if it wants to match the comment about an atomic increment. Being atomic wrt. interrupts is fine if threads never run concurrently, only via context switch on a single core. (e.g. atomicity between the main thread and a signal handler, or interrupt handler on the same core). But since this is titled smphello.s, uniprocessor assumptions seem unlikely.