I have been a high-level coder, and architectures are pretty new to me, so I decided to read the tutorial on Assembly here:
http://en.wikibooks.org/wiki/X86_Assembly/Print_Version
Far down the tutorial, instructions on how to convert the Hello World! program
#include <stdio.h>
int main(void) {
printf("Hello, world!\n");
return 0;
}
into equivalent assembly code was given and the following was generated:
.text
LC0:
.ascii "Hello, world!\12\0"
.globl _main
_main:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
andl $-16, %esp
movl $0, %eax
movl %eax, -4(%ebp)
movl -4(%ebp), %eax
call __alloca
call ___main
movl $LC0, (%esp)
call _printf
movl $0, %eax
leave
ret
For one of the lines,
andl $-16, %esp
the explanation was:
This code "and"s ESP with 0xFFFFFFF0, aligning the stack with the next lowest 16-byte boundary. An examination of Mingw's source code reveals that this may be for SIMD instructions appearing in the "_main" routine, which operate only on aligned addresses. Since our routine doesn't contain SIMD instructions, this line is unnecessary.
I do not understand this point. Can someone give me an explanation of what it means to align the stack with the next 16-byte boundary and why it is required? And how is the andl
achieving this?
Assume the stack looks like this on entry to _main
(the address of the stack pointer is just an example):
| existing |
| stack content |
+-----------------+ <--- 0xbfff1230
Push %ebp
, and subtract 8 from %esp
to reserve some space for local variables:
| existing |
| stack content |
+-----------------+ <--- 0xbfff1230
| %ebp |
+-----------------+ <--- 0xbfff122c
: reserved :
: space :
+-----------------+ <--- 0xbfff1224
Now, the andl
instruction zeroes the low 4 bits of %esp
, which may decrease it; in this particular example, it has the effect of reserving an additional 4 bytes:
| existing |
| stack content |
+-----------------+ <--- 0xbfff1230
| %ebp |
+-----------------+ <--- 0xbfff122c
: reserved :
: space :
+ - - - - - - - - + <--- 0xbfff1224
: extra space :
+-----------------+ <--- 0xbfff1220
The point of this is that there are some "SIMD" (Single Instruction, Multiple Data) instructions (also known in x86-land as "SSE" for "Streaming SIMD Extensions") which can perform parallel operations on multiple words in memory, but require those multiple words to be a block starting at an address which is a multiple of 16 bytes.
In general, the compiler can't assume that particular offsets from %esp
will result in a suitable address (because the state of %esp
on entry to the function depends on the calling code). But, by deliberately aligning the stack pointer in this way, the compiler knows that adding any multiple of 16 bytes to the stack pointer will result in a 16-byte aligned address, which is safe for use with these SIMD instructions.