Struct Reordering by compiler

DarthRubik picture DarthRubik · Jul 7, 2016 · Viewed 8k times · Source

Suppose I have a struct like this:

struct MyStruct
{
  uint8_t var0;
  uint32_t var1;
  uint8_t var2;
  uint8_t var3;
  uint8_t var4;
};

This is possibly going to waste a bunch (well not a ton) of space. This is because of necessary alignment of the uint32_t variable.

In actuality (after aligning the structure so that it can actually use the uint32_t variable) it might look something like this:

struct MyStruct
{
  uint8_t var0;
  uint8_t unused[3];  //3 bytes of wasted space
  uint32_t var1;
  uint8_t var2;
  uint8_t var3;
  uint8_t var4;
};

A more efficient struct would be:

struct MyStruct
{
  uint8_t var0;
  uint8_t var2;
  uint8_t var3;
  uint8_t var4;
  uint32_t var1;
};

Now, the question is:

Why is the compiler forbidden (by the standard) from reordering the struct?

I don't see any way you could shoot your self in the foot if the struct was reordered.

Answer

Matthieu M. picture Matthieu M. · Jul 7, 2016

Why is the compiler forbidden (by the standard) from reordering the struct?

The basic reason is: for compatibility with C.

Remember that C is, originally, a high-level assembly language. It is quite common in C to view memory (network packets, ...) by reinterpreting the bytes as a specific struct.

This has led to multiple features relying on this property:

  • C guaranteed that the address of a struct and the address of its first data member are one and the same, so C++ does too (in the absence of virtual inheritance/methods).

  • C guaranteed that if you have two struct A and B and both start with a data member char followed by a data member int (and whatever after), then when you put them in a union you can write the B member and read the char and int through its A member, so C++ does too: Standard Layout.

The latter is extremely broad, and completely prevents any re-ordering of data members for most struct (or class).


Note that the Standard does allow some re-ordering: since C did not have the concept of access control, C++ specifies that the relative order of two data members with a different access control specifier is unspecified.

As far as I know, no compiler attempts to take advantage of it; but they could in theory.

Outside of C++, languages such as Rust allow compilers to re-order fields and the main Rust compiler (rustc) does so by default. Only historical decisions and a strong desire for backward compatibility prevent C++ from doing so.