Assembly - .data, .code, and registers...?

Cam picture Cam · Mar 1, 2010 · Viewed 42k times · Source

So this morning I posted a confused question about assembly and I received some great genuine help, which I really appreciate.

And now I'm starting to get into assembly and am beginning to understand how it works.

Things I feel I understand alright include the stack, interrupts, binary/hex, and in general what most of the basic operations do (jmp, push, mov, etc).

Concepts that I'm struggling to understand and would like help with are below - it would be a huge help if you could address any of the following:

  1. What exactly is happening in the .data section? Are those variables we're declaring?
  2. If so, can we declare variables later in the code section? If not, why not? If so, how, and why do we use the data section then?
  3. What's a register? How does it compare to a variable? I mean I know it's a location that stores a small piece of information... but that sounds exactly like a variable to me.
  4. How do I make an array? I know this seems kind of random, but I'm curious as to how I'd go about doing something like this.
  5. Is there a list somewhere of common practices for what each register should be used for? I still don't get them completely, but have noticed some people saying, for example, that a certain register should be used to store 'return values' from procedures - is there a comprehensive or at least informative list of such practices?
  6. One of the reasons I'm learning assembly is to better understand what's going on behind my high level code. With that in mind - when I'm programming in c++, I'm often thinking about the stack and the heap. In assembly I know what the stack is - where's the 'heap'?

Some info: I'm using masm32 with WinAsm as an IDE, and I'm working on Windows 7. I have a lot of prior experience programming in higher level languages such as c++/java.


edit: Thanks for the help everyone, extremely informative as usual! Great stuff! One last thing though - I'm wondering what the difference is between the Stack Pointer, and the Base pointer, or ESP and EBP. Can someone help me out?

edit: I think I get it now... ESP always points to the top of the stack. However, you can point EBP at whatever you want. ESP is automatically handled but you can do whatever you want with EBP. For example:

push 6
push 5
push 4
mov EBP, ESP
push 3
push 2

In this scenario, EBP now points to the address holding 4, but ESP now points to the address holding 2.

In a real application 6, 5, and 4 could have been function arguments, whereas 3 and 2 could be local variables within that function.

Answer

Carl Norum picture Carl Norum · Mar 1, 2010

Let's try to answer in order!

  1. The data section contains anything that you want to be automatically initialized for you by the system before it calls the entry point of your program. You're right, normally global variables end up here. Zero-initialized data is generally not included in the executable file, since there's no reason to - a couple of directives to the program loader are all that's needed to generate that space. Once your program starts running, the ZI and data regions are generally interchangeable. Wikipedia has a lot more information.

  2. Variables don't really exist when assembly programming, at least not in the sense they do when you're writing C code. All you have is the decisions you've made about how to lay out your memory. Variables can be on the stack, somewhere in memory, or just live only in registers.

  3. Registers are the internal data storage of the processor. You can, in general, only do operations on values in processor registers. You can load and store their contents to and from memory, which is the basic operation of how your computer works. Here's a quick example. This C code:

    int a = 5;
    int b = 6;
    int *d = (int *)0x12345678; // assume 0x12345678 is a valid memory pointer
    *d = a + b;
    

    Might get translated to some (simplified) assembly along the lines of:

    load  r1, 5
    load  r2, 6
    load  r4, 0x1234568
    add   r3, r1, r2
    store r4, r3
    

    In this case, you can think of the registers as variables, but in general it's not necessary that any one variable always stay in the same register; depending on how complicated your routine is, it may not even be possible. You'll need to push some data onto the stack, pop other data off, and so on. A 'variable' is that logical piece of data, not where it lives in memory or registers, etc.

  4. An array is just a contiguous block of memory - for a local array, you can just decrement the stack pointer appropriately. For a global array, you can declare that block in the data section.

  5. There are a bunch of conventions about registers - check your platform's ABI or calling convention document for details about how to use them correctly. Your assembler documentation might have information as well. Check the ABI article on wikipedia.

  6. Your assembly program can make the same system calls any C program could, so you can just call malloc() to get memory from the heap.