Assembly: Using the Data Segment Register (DS)

Mykel Stone picture Mykel Stone · Feb 5, 2011 · Viewed 29.1k times · Source

Currently I am in the midst of learning x86 assembly for fun, I'm love microcontroller programming, so I'm familiar with assembly.

Currently I've been searching high and low for the answer to this question, but can't seem to find it... the DS register, I know it's supposed to point to the global data in my program, but I don't know how it works exactly. I'm using NASM, and in most simple programs I see the following:

[org 0x7C00]
[bits 16]  

main:
mov ax, 0x0000
mov ds, ax
mov al, [msg]  
mov ah, 0x0E  
mov bx, 0x0007  
int 0x10    
jmp $  

msg db 'X'

times 510-($-$$) db 0  
dw 0xAA55

and that works perfectly (even if I omit the bolded code), but how? Does the CPU automagically load the global variables starting at 0x0000? or is there something intrinsic here that I'm missing?

Answer

LocoDelAssembly picture LocoDelAssembly · Feb 5, 2011

When the computer is under real mode (the mode the CPU is at when the BIOS executes the bootloader), the method the CPU uses to calculate the address is very simple: Multiply segment register value by 16 (shift bits 4 positions to left), then add the offset.

For instance in an instruction like "mov ax, [0x1234]" the CPU would use "DS * 0x10 + 0x1234" as the effective address (the first term resolves to zero in your case.) When you have one like "mov ax, [BP+0x32]" then the CPU will use "SS * 0x10 + BP + 0x32". Note that now the CPU used a different segment register (the Stack Segment), and that is because when the BP register is used, the CPU assumes you wan't to access the stack by default (but you can override this by using [DS:BP + 0x32]).

More o less what I've explained and more can be found at http://wiki.osdev.org/Real_Mode and http://www.internals.com/articles/protmode/realmode.htm and lots of more places.

BTW, "msg" should be located more or less at 0x7C11 address.