What's sizeof(size_t) on 32-bit vs the various 64-bit data models?

anonymous picture anonymous · May 28, 2009 · Viewed 131.2k times · Source

On a 64-bit system, sizeof(unsigned long) depends on the data model implemented by the system, for example, it is 4 bytes on LLP64 (Windows), 8 bytes on LP64 (Linux, etc.). What's sizeof(size_t) supposed to be? Does it vary with data model like sizeof(long) does? If so, how?


References:

64-bit data models on Wikipedia

Answer

bdonlan picture bdonlan · May 28, 2009

size_t is defined by the C standard to be the unsigned integer return type of the sizeof operator (C99 6.3.5.4.4), and the argument of malloc and friends (C99 7.20.3.3 etc). The actual range is set such that the maximum (SIZE_MAX) is at least 65535 (C99 7.18.3.2).

However, this doesn't let us determine sizeof(size_t). The implementation is free to use any representation it likes for size_t - so there is no upper bound on size - and the implementation is also free to define a byte as 16-bits, in which case size_t can be equivalent to unsigned char.

Putting that aside, however, in general you'll have 32-bit size_t on 32-bit programs, and 64-bit on 64-bit programs, regardless of the data model. Generally the data model only affects static data; for example, in GCC:

`-mcmodel=small'
     Generate code for the small code model: the program and its
     symbols must be linked in the lower 2 GB of the address space.
     Pointers are 64 bits.  Programs can be statically or dynamically
     linked.  This is the default code model.

`-mcmodel=kernel'
     Generate code for the kernel code model.  The kernel runs in the
     negative 2 GB of the address space.  This model has to be used for
     Linux kernel code.

`-mcmodel=medium'
     Generate code for the medium model: The program is linked in the
     lower 2 GB of the address space but symbols can be located
     anywhere in the address space.  Programs can be statically or
     dynamically linked, but building of shared libraries are not
     supported with the medium model.

`-mcmodel=large'
     Generate code for the large model: This model makes no assumptions
     about addresses and sizes of sections.

You'll note that pointers are 64-bit in all cases; and there's little point to having 64-bit pointers but not 64-bit sizes, after all.