Is there a way to enforce specific endianness for a C or C++ struct?

vsz picture vsz · Jul 18, 2011 · Viewed 14.6k times · Source

I've seen a few questions and answers regarding to the endianness of structs, but they were about detecting the endianness of a system, or converting data between the two different endianness.

What I would like to now, however, if there is a way to enforce specific endianness of a given struct. Are there some good compiler directives or other simple solutions besides rewriting the whole thing out of a lot of macros manipulating on bitfields?

A general solution would be nice, but I would be happy with a specific gcc solution as well.

Edit:

Thank you for all the comments pointing out why it's not a good idea to enforce endianness, but in my case that's exactly what I need.

A large amount of data is generated by a specific processor (which will never ever change, it's an embedded system with a custom hardware), and it has to be read by a program (which I am working on) running on an unknown processor. Byte-wise evaluation of the data would be horribly troublesome because it consists of hundreds of different types of structs, which are huge, and deep: most of them have many layers of other huge structs inside.

Changing the software for the embedded processor is out of the question. The source is available, this is why I intend to use the structs from that system instead of starting from scratch and evaluating all the data byte-wise.

This is why I need to tell the compiler which endianness it should use, it doesn't matter how efficient or not will it be.

It does not have to be a real change in endianness. Even if it's just an interface, and physically everything is handled in the processors own endianness, it's perfectly acceptable to me.

Answer

Nemo picture Nemo · Jul 18, 2011

The way I usually handle this is like so:

#include <arpa/inet.h> // for ntohs() etc.
#include <stdint.h>

class be_uint16_t {
public:
        be_uint16_t() : be_val_(0) {
        }
        // Transparently cast from uint16_t
        be_uint16_t(const uint16_t &val) : be_val_(htons(val)) {
        }
        // Transparently cast to uint16_t
        operator uint16_t() const {
                return ntohs(be_val_);
        }
private:
        uint16_t be_val_;
} __attribute__((packed));

Similarly for be_uint32_t.

Then you can define your struct like this:

struct be_fixed64_t {
    be_uint32_t int_part;
    be_uint32_t frac_part;
} __attribute__((packed));

The point is that the compiler will almost certainly lay out the fields in the order you write them, so all you are really worried about is big-endian integers. The be_uint16_t object is a class that knows how to convert itself transparently between big-endian and machine-endian as required. Like this:

be_uint16_t x = 12;
x = x + 1; // Yes, this actually works
write(fd, &x, sizeof(x)); // writes 13 to file in big-endian form

In fact, if you compile that snippet with any reasonably good C++ compiler, you should find it emits a big-endian "13" as a constant.

With these objects, the in-memory representation is big-endian. So you can create arrays of them, put them in structures, etc. But when you go to operate on them, they magically cast to machine-endian. This is typically a single instruction on x86, so it is very efficient. There are a few contexts where you have to cast by hand:

be_uint16_t x = 37;
printf("x == %u\n", (unsigned)x); // Fails to compile without the cast

...but for most code, you can just use them as if they were built-in types.