gfortran for dummies: What does mcmodel=medium do exactly?

mgilson picture mgilson · Oct 16, 2012 · Viewed 26.4k times · Source

I have some code that is giving me relocation errors when compiling, below is an example which illustrates the problem:

  program main
  common/baz/a,b,c
  real a,b,c
  b = 0.0
  call foo()
  print*, b
  end

  subroutine foo()
  common/baz/a,b,c
  real a,b,c

  integer, parameter :: nx = 450
  integer, parameter :: ny = 144
  integer, parameter :: nz = 144
  integer, parameter :: nf = 23*3
  real :: bar(nf,nx*ny*nz)

  !real, allocatable,dimension(:,:) :: bar
  !allocate(bar(nf,nx*ny*nz))

  bar = 1.0
  b = bar(12,32*138*42)

  return
  end

Compiling this with gfortran -O3 -g -o test test.f, I get the following error:

relocation truncated to fit: R_X86_64_PC32 against symbol `baz_' defined in COMMON section in /tmp/ccIkj6tt.o

But it works if I use gfortran -O3 -mcmodel=medium -g -o test test.f. Also note that it works if I make the array allocatable and allocate it within the subroutine.

My question is what exactly does -mcmodel=medium do? I was under the impression that the two versions of the code (the one with allocatable arrays and the one without) were more or less equivalent ...

Answer

Hristo Iliev picture Hristo Iliev · Oct 18, 2012

Since bar is quite large the compiler generates static allocation instead of automatic allocation on the stack. Static arrays are created with the .comm assembly directive which creates an allocation in the so-called COMMON section. Symbols from that section are gathered, same-named symbols are merged (reduced to one symbol request with size equal to the largest size requested) and then what is rest is mapped to the BSS (uninitialised data) section in most executable formats. With ELF executables the .bss section is located in the data segment, just before the data segment part of the heap (there is another heap part managed by anonymous memory mappings which does not reside in the data segment).

With the small memory model 32-bit addressing instructions are used to address symbols on x86_64. This makes code smaller and also faster. Some assembly output when using small memory model:

movl    $bar.1535, %ebx    <---- Instruction length saving
...
movl    %eax, baz_+4(%rip) <---- Problem!!
...
.local  bar.1535
.comm   bar.1535,2575411200,32
...
.comm   baz_,12,16

This uses a 32-bit move instruction (5 bytes long) to put the value of the bar.1535 symbol (this value equals to the address of the symbol location) into the lower 32 bits of the RBX register (the upper 32 bits get zeroed). The bar.1535 symbol itself is allocated using the .comm directive. Memory for the baz COMMON block is allocated afterwards. Because bar.1535 is very large, baz_ ends up more than 2 GiB from the start of the .bss section. This poses a problem in the second movl instruction since a non-32bit (signed) offset from RIP should be used to address the b variable where the value of EAX has to be moved into. This is only detected during link time. The assembler itself does not know the appropriate offset since it doesn't know what the value of the instruction pointer (RIP) would be (it depends on the absolute virtual address where the code is loaded and this is determined by the linker), so it simply puts an offset of 0 and then creates a relocation request of type R_X86_64_PC32. It instructs the linker to patch the value of 0 with the real offset value. But it cannot do that since the offset value would not fit inside a signed 32-bit integer and hence bails out.

With the medium memory model in place things look like this:

movabsq $bar.1535, %r10
...
movl    %eax, baz_+4(%rip)
...
.local  bar.1535
.largecomm      bar.1535,2575411200,32
...
.comm   baz_,12,16

First a 64-bit immediate move instruction (10 bytes long) is used to put the 64-bit value which represents the address of bar.1535 into register R10. Memory for the bar.1535 symbol is allocated using the .largecomm directive and thus it ends in the .lbss section of the ELF exectuable. .lbss is used to store symbols which might not fit in the first 2 GiB (and hence should not be addressed using 32-bit instructions or RIP-relative addressing), while smaller things go to .bss (baz_ is still allocated using .comm and not .largecomm). Since the .lbss section is placed after the .bss section in the ELF linker script, baz_ would not end up being inaccessible using 32-bit RIP-related addressing.

All addressing modes are described in the System V ABI: AMD64 Architecture Processor Supplement. It is a heavy technical reading but a must read for anybody who really wants to understand how 64-bit code works on most x86_64 Unixes.

When an ALLOCATABLE array is used instead, gfortran allocates heap memory (most likely implemented as an anonymous memory map given the large size of the allocation):

movl    $2575411200, %edi
...
call    malloc
movq    %rax, %rdi

This is basically RDI = malloc(2575411200). From then on elements of bar are accessed by using positive offsets from the value stored in RDI:

movl    51190040(%rdi), %eax
movl    %eax, baz_+4(%rip)

For locations that are more than 2 GiB from the start of bar, a more elaborate method is used. E.g. to implement b = bar(12,144*144*450) gfortran emits:

; Some computations that leave the offset in RAX
movl    (%rdi,%rax), %eax
movl    %eax, baz_+4(%rip)

This code is not affected by the memory model since nothing is assumed about the address where the dynamic allocation would be made. Also, since the array is not passed around, no descriptor is being built. If you add another function that takes an assumed-shaped array and pass bar to it, a descriptor for bar is created as an automatic variable (i.e. on the stack of foo). If the array is made static with the SAVE attribute, the descriptor is placed in the .bss section:

movl    $bar.1580, %edi
...
; RAX still holds the address of the allocated memory as returned by malloc
; Computations, computations
movl    -232(%rax,%rdx,4), %eax
movl    %eax, baz_+4(%rip)

The first move prepares the argument of a function call (in my sample case call boo(bar) where boo has an interface that declares it as taking an assumed-shape array). It moves the address of the array descriptor of bar into EDI. This is a 32-bit immediate move so the descriptor is expected to be in the first 2 GiB. Indeed, it is allocated in the .bss in both small and medium memory models like this:

.local  bar.1580
.comm   bar.1580,72,32