What is the meaning of align an the start of a section?
For example:
align 4
a: dw 0
How does it save memory access?
I always liked the comprehensive explanation by Samael in the following thread:
Explanation of the ALIGN MASM directive, How is this directive interpreted by the compiler?
Quote:
ALIGN X
The ALIGN directive is accompanied by a number (X).
This number (X) must be a power of 2. That is 2, 4, 8, 16, and so on...
The directive allows you to enforce alignment of the instruction or data immediately after the directive, on a memory address that is a multiple of the value X.
The extra space, between the previous instruction/data and the one after the ALIGN directive, is padded with NULL instructions (or equivalent, such as MOV EAX,EAX) in the case of code segments, and NULLs in the case of data segments.
The number X, cannot not be greater than the default alignment of the segment in which the ALIGN directive is referenced. It must be less or equal to the default alignment of the segment. More on this to follow...
A. Working with code
If the directive precedes code, the reason would be optimization (with reference to execution speed) . Some instructions are executed faster if they are aligned on a 4 byte (32 bits) boundary. This kind of optimization can be usually used or referenced in time-critical functions, such as loops that are designed for manipulating large amount of data, constantly. Besides execution speed improvement, there is no "necessity" to use the directive with code, though.
B. Working with data
The same holds true also with data - we mainly use the directive in order to improve execution speed - as a means of speed optimization. There are situations where data misalignment can have a huge performance impact on our application.
But with data, there are situations where correct alignment is a necessity, not luxury. This holds especially true on the Itanium platform and the SSE/SSE2 instruction set, where misalignment on a 128bit boundary (X=16), may fire up a general-protection exception.
An interesting and most informative article on data alignment, though orientated on the MS C/C++ compiler, is the following:
Windows Data Alignment on IPF, x86, and x64, by Kang Su Gatlin, MSDN
A. If you use the .386 processor directive, and you havent explicitly declared the default alignment value for a segment, the default segment alignment is of DWORD (4 bytes) size. Yeah, in this case, X = 4. You can then use the following values with the ALIGN directive: (X=2, X= 4). Remember, X must be less or equal than the segment alignment.
B. If you use the .486 processor directive and above, and you havent explicitly declared the default alignment value for a segment, the default segment alignment is of PARAGRAPH (16 bytes) size. In this case, X = 16. You can then use the following values with the ALIGN directive: (X=2, X= 4, X = 8, X = 16).
C. You can declare a segment with non-default alignment in the following way:
;Here, we create a code segment named "JUNK", which starts aligned on a 256 bytes boundary
JUNK SEGMENT PAGE PUBLIC FLAT 'CODE'
;Your code starts aligned on a PAGE boundary (X=256)
; Possible values that can be used with the ALIGN directive
; within this segment, are all the powers of 2, up to 256.
JUNK ENDS
Here are the aliases for segment aligment values...
Align Type Starting Address
BYTE Next available byte address.
WORD Next available word address (2 bytes per word).
DWORD Next available double word address (4 bytes per double word).
PARA Next available paragraph address (16 bytes per paragraph).
PAGE Next available page address (256 bytes per page).
Consider the following example (read the comments on the usage of the ALIGN directive).
.486
.MODEL FLAT,STDCALL
OPTION CASEMAP:NONE
INCLUDE \MASM32\INCLUDE\WINDOWS.INC
.DATA
var1 BYTE 01; This variable is of 1 byte size.
ALIGN 4
; We enforce the next variable to be alingned in the next memory
;address that is multiple of 4.
;This means that the extra space between the first variable
;and this one will be padded with nulls. ( 3 bytes in total)
var2 BYTE 02; This variable is of 1 byte size.
ALIGN 2
; We enforce the next variable to be alingned in the next memory
;address that is multiple of 2.
;This means that the extra space between the second variable
;and this one will be padded with nulls. ( 1 byte in total)
var3 BYTE 03; This variable is of 1 byte size.
.CODE
; Enforce the first instruction to be aligned on a memory address multiple of 4
ALIGN 4
EntryPoint:
; The following 3 instructions have 7 byte - opcodes
; of the form 0F B6 05 XX XX XX XX
; In the following block, we do not enforce opcode
; alignment in memory...
MOVZX EAX, var1
MOVZX EAX, var2
MOVZX EAX, var3
; The following 3 instructions have 7 byte - opcodes
; of the form 0F B6 05 XX XX XX XX
; In the following block, we enforce opcode alignment
; for the third instruction, on a memory address multiple of 4.
; Since the second instruction opcodes end on a memory address
; that is not a multiple of 4, some nops would be injected before
; the first opcode of the next instruction, so that the first opcode of it
; will start on a menory address that is a multiple of 4.
MOVZX EAX, var1
MOVZX EAX, var2
ALIGN 4
MOVZX EAX, var3
; The following 3 instructions have 7 byte - opcodes
; of the form 0F B6 05 XX XX XX XX
; In the following block, we enforce opcode alignment
; for all instructions, on a memory address multiple of 4.
;The extra space between each instruction will be padded with NOPs
ALIGN 4
MOVZX EAX, var1
ALIGN 4
MOVZX EAX, var2
ALIGN 4
MOVZX EAX, var3
ALIGN 2
; The following instruction has 1 byte - opcode (CC).
; In the following block, we enforce opcode alignment
; for the instruction, on a memory address multiple of 2.
;The extra space between this instruction ,
;and the previous one, will be padded with NOPs
INT 3
END EntryPoint
If we compile the program, here's what the compiler generated:
.DATA
;------------SNIP-SNIP------------------------------
.data:00402000 var1 db 1
.data:00402001 db 0; This NULL was generated to enforce the alignment of the next instruction on an address that is a multiple of 4
.data:00402002 db 0; This NULL was generated to enforce the alignment of the next instruction on an address that is a multiple of 4
.data:00402003 db 0; This NULL was generated to enforce the alignment of the next instruction on an address that is a multiple of 4
.data:00402004 var2 db 2
.data:00402005 db 0; This NULL was generated to enforce the alignment of the next instruction oon an address that is a multiple of 2
.data:00402006 var3 db 3
.data:00402007 db 0; The rest of the NULLs are to fill the memory page in which the segment will be loaded
;------------SNIP-SNIP------------------------------
.CODE
;------------SNIP-SNIP------------------------------
.text:00401000 start:
.text:00401000 movzx eax, var1
.text:00401007 movzx eax, var2
.text:0040100E movzx eax, var3
.text:00401015 movzx eax, var1
.text:0040101C movzx eax, var2
.text:00401023 nop; This NOP was generated to enforce the alignment...
.text:00401024 movzx eax, var3
.text:0040102B nop; This NOP was generated to enforce the alignment...
.text:0040102C movzx eax, var1
.text:00401033 nop; This NOP was generated to enforce the alignment...
.text:00401034 movzx eax, var2
.text:0040103B nop; This NOP was generated to enforce the alignment...
.text:0040103C movzx eax, var3
.text:00401043 nop; This NOP was generated to enforce the alignment...
.text:00401044 int 3 ; Trap to Debugger
.text:00401044; ---------------------------------------------------------------------------
.text:00401045 db 0
.text:00401046 db 0
.text:00401047 db 0
.text:00401048 db 0
;------------SNIP-SNIP------------------------------
As you see, after the code / data of our application ends, the compiler generates more instructions / data. This is because the PE sections, when loaded in memory, are aligned on a PAGE size (512 bytes).
So, the compiler, fills the extra space to the next 512 byte boudary with junk bytes (usually INT 3 instructions, NOPs or NULLs for code segments, and 0FFh, NULLs for data segments) in order to ensure that the memory alignment for the loaded PE image is correct...