What does "nop dword ptr [rax+rax]" x64 assembly instruction do?

c00000fd picture c00000fd · May 16, 2017 · Viewed 7.6k times · Source

I'm trying to understand the x64 assembly optimization that is done by the compiler.

I compiled a small C++ project as Release build with Visual Studio 2008 SP1 IDE on Windows 8.1.

And one of the lines contained the following assembly code:

B8 31 00 00 00   mov         eax,31h
0F 1F 44 00 00   nop         dword ptr [rax+rax]

And here's a screenshot:

enter image description here

As far as I know nop by itself is do nothing, but I've never seen it with an operand like that.

Can someone explain what does it do?

Answer

Glenn Slayden picture Glenn Slayden · May 30, 2018

In a comment elsewhere on this page, Michael Petch points to a web page which describes the Intel x86 multi-byte NOP opcodes. The page has a table of useful information, but unfortunately the HTML is messed up so you can't read it. Here is some information from that page, plus that table presented a readable form:

Multi-Byte NOP
http://www.felixcloutier.com/x86/NOP.html
The one-byte NOP instruction is an alias mnemonic for the XCHG (E)AX, (E)AX instruction.

The multi-byte NOP instruction performs no operation on supported processors and generates undefined opcode exception on processors that do not support the multi-byte NOP instruction.

The memory operand form of the instruction allows software to create a byte sequence of “no operation” as one instruction.

For situations where multiple-byte NOPs are needed, the recommended operations (32-bit mode and 64-bit mode) are:     [my edit: in 64-bit mode, write rax instead of eax.]

Length    Assembly                                     Byte Sequence
-------   ------------------------------------------   --------------------------
1 byte    nop                                          90
2 bytes   66 nop                                       66 90
3 bytes   nop dword ptr [eax]                          0F 1F 00
4 bytes   nop dword ptr [eax + 00h]                    0F 1F 40 00
5 bytes   nop dword ptr [eax + eax*1 + 00h]            0F 1F 44 00 00
6 bytes   66 nop word ptr [eax + eax*1 + 00h]          66 0F 1F 44 00 00
7 bytes   nop dword ptr [eax + 00000000h]              0F 1F 80 00 00 00 00
8 bytes   nop dword ptr [eax + eax*1 + 00000000h]      0F 1F 84 00 00 00 00 00
9 bytes   66 nop word ptr [eax + eax*1 + 00000000h]    66 0F 1F 84 00 00 00 00 00


Note that the technique for selecting the right byte sequence--and thus the desired total size--may differ according to which assembler you are using.

For example, the following two lines of assembly taken from the table are ostensibly similar:

nop dword ptr [eax + 00h]
nop dword ptr [eax + 00000000h]

These differ only in the number of leading zeros, and some assemblers may make it hard to disable their "helpful" feature of always encoding the shortest possible byte sequence, which could make the second expression inaccessible.

For the multi-byte NOP situation, you don't want this "help" because you need to make sure that you actually get the desired number of bytes. So the issue is how to specify an exact combination of mod and r/m bits that ends up with the desired disp size--but via instruction mnemonics alone. This topic is complex, and certainly beyond the scope of my knowledge, but Scaled Indexing, MOD+R/M and SIB might be a starting place.

Now as I know you were just thinking, if you find it difficult or impossible to coerce your assembler's cooperation via instruction mnemonics you can always just resort to db ("define bytes") as a simple no-fuss alternative which is, um, guaranteed to work.