From what I understand, instructions and data in an object file all have addresses. First data item start at address 0 and first instruction also start at address 0.
The relocation table contains information about instructions that need to be updated if the addresses in the file change, for example if the file is linked together with another. Line A, in the example below, would be in the relocation table. I don't think B would be in the relocation table, since the address of label "equal" is relative to B. Are these correct assumptions?
I know the symbol table show the labels the file have and also labels that haven't been resolved. But what other information does the symbol table contain?
Also, when the assembler translates the instructions to binary, what is placed in those instructions that have unresolved references?. B in this example.
.data
TEXT: .asciiz "Foo"
.text
.global main
main:
li t0, 1
beq t0, 1, equal #B
equal:
la a0, TEXT
jal printf #A
Yes, your assumptions are correct. There are various types of relocations, what the assembler emits into the instruction depends on the type. Generally it's an offset to be added. You can use objdump -dr
to see relocations. For better illustration I have changed your code a little:
.data
.int 0
TEXT: .asciiz "Foo"
.text
.global main
main:
li $t0, 1
beq $t0, 1, equal #B
bne $t0, 42, foo #C
equal:
la $a0, TEXT
jal printf #A
Output of objdump:
00000000 <main>:
0: 24080001 li t0,1
4: 24010001 li at,1
8: 11010004 beq t0,at,1c <equal>
c: 00000000 nop
10: 2401002a li at,42
14: 1501ffff bne t0,at,14 <main+0x14>
14: R_MIPS_PC16 foo
18: 00000000 nop
0000001c <equal>:
1c: 3c040000 lui a0,0x0
1c: R_MIPS_HI16 .data
20: 0c000000 jal 0 <main>
20: R_MIPS_26 printf
24: 24840004 addiu a0,a0,4
24: R_MIPS_LO16 .data
As you said, there is no relocation for the beq
since that's a relative address within this object file.
The bne
I added (line marked with C
) references an external symbol, so even though the address is relative a relocation entry is needed. It will be of type R_MIPS_PC16
to produce a 16 bit signed word offset to symbol foo
. As the instruction encoding requires offset from the next word and not the current PC
that the relocation uses, 1
has to be subtracted, and that's encoded as 2's complement ffff
into the instruction itself.
The la
pseudoinstruction has been translated by the assembler into a lui
/addiu
pair (the latter in the delay slot of the jal
). For the lui
a R_MIPS_HI16
relocation is created against the .data
section which will fill in the top 16 bits. Since the symbol TEXT
is at address 4
in the .data
section, the top 16 bits of the offset are 0
. This means the instruction contains 0
offset. Similarly, for the low 16 bits, except there the instruction contains an offset of 4
.
Finally, the jal printf
is using yet another kind of relocation that is tailored for the encoding required by the instruction. The offset is zero because the jump is directly to the referenced symbol. Note that objdump is trying to be helpful by decoding that, but it doesn't process the relocation so the <main>
it outputs is of course nonsense.