How to make GCC generate bswap instruction for big endian store without builtins?

Question 1

How to make GCC generate bswap instruction for big endian store without builtins?

c gcc x86 compiler-optimization endianness

nwellnhof · Apr 8, 2016 · Viewed 10.4k times · Source

Answer

Answer

This seems to do the trick:

void encode_bigend_u64(uint64_t value, void* dest)
{
  value =
      ((value & 0xFF00000000000000u) >> 56u) |
      ((value & 0x00FF000000000000u) >> 40u) |
      ((value & 0x0000FF0000000000u) >> 24u) |
      ((value & 0x000000FF00000000u) >>  8u) |
      ((value & 0x00000000FF000000u) <<  8u) |      
      ((value & 0x0000000000FF0000u) << 24u) |
      ((value & 0x000000000000FF00u) << 40u) |
      ((value & 0x00000000000000FFu) << 56u);
  memcpy(dest, &value, sizeof(uint64_t));
}

clang with `-O3`

encode_bigend_u64(unsigned long, void*):
        bswapq  %rdi
        movq    %rdi, (%rsi)
        retq

clang with `-O3 -march=native`

encode_bigend_u64(unsigned long, void*):
        movbeq  %rdi, (%rsi)
        retq

gcc with `-O3`

encode_bigend_u64(unsigned long, void*):
        bswap   %rdi
        movq    %rdi, (%rsi)
        ret

gcc with `-O3 -march=native`

encode_bigend_u64(unsigned long, void*):
        movbe   %rdi, (%rsi)
        ret

Tested with clang 3.8.0 and gcc 5.3.0 on http://gcc.godbolt.org/ (so I don't know exactly what processor is underneath (for the -march=native) but I strongly suspect a recent x86_64 processor)

If you want a function which works for big endian architectures too, you can use the answers from here to detect the endianness of the system and add an if. Both the union and the pointer casts versions work and are optimized by both gcc and clang resulting in the exact same assembly (no branches). Full code on godebolt:

int is_big_endian(void)
{
    union {
        uint32_t i;
        char c[4];
    } bint = {0x01020304};

    return bint.c[0] == 1;
}

void encode_bigend_u64_union(uint64_t value, void* dest)
{
  if (!is_big_endian())
    //...
  memcpy(dest, &value, sizeof(uint64_t));
}

Intel® 64 and IA-32 Architectures Instruction Set Reference (3-542 Vol. 2A):

MOVBE—Move Data After Swapping Bytes

Performs a byte swap operation on the data copied from the second operand (source operand) and store the result in the first operand (destination operand). [...]

The MOVBE instruction is provided for swapping the bytes on a read from memory or on a write to memory; thus providing support for converting little-endian values to big-endian format and vice versa.

Question 2

I'm working on a function that stores a 64-bit value into memory in big endian format. I was hoping that I could write portable C99 code that works on both little and big endian platforms and have modern x86 compilers generate a bswap instruction automatically without any builtins or intrinsics. So I started with the following function:

#include <stdint.h>

void
encode_bigend_u64(uint64_t value, void *vdest) {
    uint64_t bigend;
    uint8_t *bytes = (uint8_t*)&bigend;
    bytes[0] = value >> 56;
    bytes[1] = value >> 48;
    bytes[2] = value >> 40;
    bytes[3] = value >> 32;
    bytes[4] = value >> 24;
    bytes[5] = value >> 16;
    bytes[6] = value >> 8;
    bytes[7] = value;
    uint64_t *dest = (uint64_t*)vdest;
    *dest = bigend;
}

This works fine for clang which compiles this function to:

bswapq  %rdi
movq    %rdi, (%rsi)
retq

But GCC fails to detect the byte swap. I tried a couple of different approaches but they only made things worse. I know that GCC can detect byte swaps using bitwise-and, shift, and bitwise-or, but why doesn't it work when writing bytes?

Edit: I found the corresponding GCC bug.

How to make GCC generate bswap instruction for big endian store without builtins?

Answer

clang with -O3

clang with -O3 -march=native

gcc with -O3

gcc with -O3 -march=native

Related questions

clang with `-O3`

clang with `-O3 -march=native`

gcc with `-O3`

gcc with `-O3 -march=native`