The difference between asm, asm volatile and clobbering memory

jleahy picture jleahy · Jan 22, 2013 · Viewed 54.3k times · Source

When implementing lock-free data structures and timing code it's often necessary to suppress the compiler's optimisations. Normally people do this using asm volatile with memory in the clobber list, but you sometimes see just asm volatile or just a plain asm clobbering memory.

What impact do these different statements have on code generation (particularly in GCC, as it's unlikely to be portable)?

Just for reference, these are the interesting variations:

asm ("");   // presumably this has no effect on code generation
asm volatile ("");
asm ("" ::: "memory");
asm volatile ("" ::: "memory");

Answer

Matthew Slattery picture Matthew Slattery · Jan 22, 2013

See the "Extended Asm" page in the GCC documentation.

You can prevent an asm instruction from being deleted by writing the keyword volatile after the asm. [...] The volatile keyword indicates that the instruction has important side-effects. GCC will not delete a volatile asm if it is reachable.

and

An asm instruction without any output operands will be treated identically to a volatile asm instruction.

None of your examples have output operands specified, so the asm and asm volatile forms behave identically: they create a point in the code which may not be deleted (unless it is proved to be unreachable).

This is not quite the same as doing nothing. See this question for an example of a dummy asm which changes code generation - in that example, code that goes round a loop 1000 times gets vectorised into code which calculates 16 iterations of the loop at once; but the presence of an asm inside the loop inhibits the optimisation (the asm must be reached 1000 times).

The "memory" clobber makes GCC assume that any memory may be arbitrarily read or written by the asm block, so will prevent the compiler from reordering loads or stores across it:

This will cause GCC to not keep memory values cached in registers across the assembler instruction and not optimize stores or loads to that memory.

(That does not prevent a CPU from reordering loads and stores with respect to another CPU, though; you need real memory barrier instructions for that.)