Why can't C compilers rearrange struct members to eliminate alignment padding?

Halle Knast picture Halle Knast · Feb 28, 2012 · Viewed 18k times · Source

Possible Duplicate:
Why doesn't GCC optimize structs?
Why doesn't C++ make the structure tighter?

Consider the following example on a 32 bit x86 machine:

Due to alignment constraints, the following struct

struct s1 {
    char a;
    int b;
    char c;
    char d;
    char e;
}

could be represented more memory-efficiently (12 vs. 8 bytes) if the members were reordered as in

struct s2 {
    int b;
    char a;
    char c;
    char d;
    char e;
}

I know that C/C++ compilers are not allowed to do this. My question is why the language was designed this way. After all, we may end up wasting vast amounts of memory, and references such as struct_ref->b would not care about the difference.

EDIT: Thank you all for your extremely useful answers. You explain very well why rearranging doesn't work because of the way the language was designed. However, it makes me think: Would these arguments would still hold if rearrangement was part of the language? Let's say that there was some specified rearrangement rule, from which we required at least that

  1. we should only reorganize the struct if actually necessary (don't do anything if the struct is already "tight")
  2. the rule only looks at the definition of the struct, not inside inner structs. This ensures that a struct type has the same layout whether or not it is internal in another structure
  3. the compiled memory layout of a given struct is predictable given its definition (that is, the rule is fixed)

Adressing your arguments one by one I reason:

  • Low-level data mapping, "element of least surprise": Just write your structs in a tight style yourself (like in @Perry's answer) and nothing has changed (requirement 1). If, for some weird reason, you want internal padding to be there, you could insert it manually using dummy variables, and/or there could be keywords/directives.

  • Compiler differences: Requirement 3 eliminates this concern. Actually, from @David Heffernan's comments, it seems that we have this problem today because different compilers pad differently?

  • Optimization: The whole point of reordering is (memory) optimization. I see lots of potential here. We may not be able to remove padding all together, but I don't see how reordering could limit optimization in any way.

  • Type casting: It seems to me that this is the biggest problem. Still, there should be ways around this. Since the rules are fixed in the language, the compiler is able to figure out how the members were reordered, and react accordingly. As mentioned above, it will always be possible to prevent reordering in the cases where you want complete control. Also, requirement 2 ensures that type-safe code will never break.

The reason I think such a rule could make sense is because I find it more natural to group struct members by their contents than by their types. Also it is easier for the compiler to choose the best ordering than it is for me when I have a lot of inner structs. The optimal layout may even be one that I cannot express in a type-safe way. On the other hand, it would appear to make the language more complicated, which is of course a drawback.

Note that I am not talking about changing the language - only if it could(/should) have been designed differently.

I know my question is hypothetical, but I think the discussion provides deeper insight in the lower levels of the machine and language design.

I'm quite new here, so I don't know if I should spawn a new question for this. Please tell me if this is the case.

Answer

user811773 picture user811773 · Feb 28, 2012

There are multiple reasons why the C compiler cannot automatically reorder the fields:

  • The C compiler doesn't know whether the struct represents the memory structure of objects beyond the current compilation unit (for example: a foreign library, a file on disc, network data, CPU page tables, ...). In such a case the binary structure of data is also defined in a place inaccessible to the compiler, so reordering the struct fields would create a data type that is inconsistent with the other definitions. For example, the header of a file in a ZIP file contains multiple misaligned 32-bit fields. Reordering the fields would make it impossible for C code to directly read or write the header (assuming the ZIP implementation would like to access the data directly):

    struct __attribute__((__packed__)) LocalFileHeader {
        uint32_t signature;
        uint16_t minVersion, flag, method, modTime, modDate;
        uint32_t crc32, compressedSize, uncompressedSize;
        uint16_t nameLength, extraLength;
    };
    

    The packed attribute prevents the compiler from aligning the fields according to their natural alignment, and it has no relation to the problem of field ordering. It would be possible to reorder the fields of LocalFileHeader so that the structure has both minimal size and has all fields aligned to their natural alignment. However, the compiler cannot choose to reorder the fields because it does not know that the struct is actually defined by the ZIP file specification.

  • C is an unsafe language. The C compiler doesn't know whether the data will be accessed via a different type than the one seen by the compiler, for example:

    struct S {
        char a;
        int b;
        char c;
    };
    
    struct S_head {
        char a;
    };
    
    struct S_ext {
        char a;
        int b;
        char c;
        int d;
        char e;
    };
    
    struct S s;
    struct S_head *head = (struct S_head*)&s;
    fn1(head);
    
    struct S_ext ext;
    struct S *sp = (struct S*)&ext;
    fn2(sp);
    

    This is a widely used low-level programming pattern, especially if the header contains the type ID of data located just beyond the header.

  • If a struct type is embedded in another struct type, it is impossible to inline the inner struct:

    struct S {
        char a;
        int b;
        char c, d, e;
    };
    
    struct T {
        char a;
        struct S s; // Cannot inline S into T, 's' has to be compact in memory
        char b;
    };
    

    This also means that moving some fields from S to a separate struct disables some optimizations:

    // Cannot fully optimize S
    struct BC { int b; char c; };
    struct S {
        char a;
        struct BC bc;
        char d, e;
    };
    
  • Because most C compilers are optimizing compilers, reordering struct fields would require new optimizations to be implemented. It is questionable whether those optimizations would be able to do better than what programmers are able to write. Designing data structures by hand is much less time consuming than other compiler tasks such as register allocation, function inlining, constant folding, transformation of a switch statement into binary search, etc. Thus the benefits to be gained by allowing the compiler to optimize data structures appear to be less tangible than traditional compiler optimizations.