gcc, strict-aliasing, and horror stories

Joseph Quinsey picture Joseph Quinsey · Jun 2, 2010 · Viewed 30.4k times · Source

In gcc-strict-aliasing-and-casting-through-a-union I asked whether anyone had encountered problems with union punning through pointers. So far, the answer seems to be No.

This question is broader: Do you have any horror stories about gcc and strict-aliasing?

Background: Quoting from AndreyT's answer in c99-strict-aliasing-rules-in-c-gcc:

"Strict aliasing rules are rooted in parts of the standard that were present in C and C++ since the beginning of [standardized] times. The clause that prohibits accessing object of one type through a lvalue of another type is present in C89/90 (6.3) as well as in C++98 (3.10/15). ... It is just that not all compilers wanted (or dared) to enforce it or rely on it."

Well, gcc is now daring to do so, with its -fstrict-aliasing switch. And this has caused some problems. See, for example, the excellent article http://davmac.wordpress.com/2009/10/ about a Mysql bug, and the equally excellent discussion in http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html.

Some other less-relevant links:

So to repeat, do you have a horror story of your own? Problems not indicated by -Wstrict-aliasing would, of course, be preferred. And other C compilers are also welcome.

Added June 2nd: The first link in Michael Burr's answer, which does indeed qualify as a horror story, is perhaps a bit dated (from 2003). I did a quick test, but the problem has apparently gone away.

Source:

#include <string.h>
struct iw_event {               /* dummy! */
    int len;
};
char *iwe_stream_add_event(
    char *stream,               /* Stream of events */
    char *ends,                 /* End of stream */
    struct iw_event *iwe,       /* Payload */
    int event_len)              /* Real size of payload */
{
    /* Check if it's possible */
    if ((stream + event_len) < ends) {
            iwe->len = event_len;
            memcpy(stream, (char *) iwe, event_len);
            stream += event_len;
    }
    return stream;
}

The specific complaint is:

Some users have complained that when the [above] code is compiled without the -fno-strict-aliasing, the order of the write and memcpy is inverted (which means a bogus len is mem-copied into the stream).

Compiled code, using gcc 4.3.4 on CYGWIN wih -O3 (please correct me if I am wrong--my assembler is a bit rusty!):

_iwe_stream_add_event:
        pushl       %ebp
        movl        %esp, %ebp
        pushl       %ebx
        subl        $20, %esp
        movl        8(%ebp), %eax       # stream    --> %eax
        movl        20(%ebp), %edx      # event_len --> %edx
        leal        (%eax,%edx), %ebx   # sum       --> %ebx
        cmpl        12(%ebp), %ebx      # compare sum with ends
        jae L2
        movl        16(%ebp), %ecx      # iwe       --> %ecx
        movl        %edx, (%ecx)        # event_len --> iwe->len (!!)
        movl        %edx, 8(%esp)       # event_len --> stack
        movl        %ecx, 4(%esp)       # iwe       --> stack
        movl        %eax, (%esp)        # stream    --> stack
        call        _memcpy
        movl        %ebx, %eax          # sum       --> retval
L2:
        addl        $20, %esp
        popl        %ebx
        leave
        ret

And for the second link in Michael's answer,

*(unsigned short *)&a = 4;

gcc will usually (always?) give a warning. But I believe a valid solution to this (for gcc) is to use:

#define CAST(type, x) (((union {typeof(x) src; type dst;}*)&(x))->dst)
// ...
CAST(unsigned short, a) = 4;

I've asked SO whether this is OK in gcc-strict-aliasing-and-casting-through-a-union, but so far nobody disagrees.

Answer

Michael Burr picture Michael Burr · Jun 2, 2010

No horror story of my own, but here are some quotes from Linus Torvalds (sorry if these are already in one of the linked references in the question):

http://lkml.org/lkml/2003/2/26/158:

Date Wed, 26 Feb 2003 09:22:15 -0800 Subject Re: Invalid compilation without -fno-strict-aliasing From Jean Tourrilhes <>

On Wed, Feb 26, 2003 at 04:38:10PM +0100, Horst von Brand wrote:

Jean Tourrilhes <> said:

It looks like a compiler bug to me... Some users have complained that when the following code is compiled without the -fno-strict-aliasing, the order of the write and memcpy is inverted (which mean a bogus len is mem-copied into the stream). Code (from linux/include/net/iw_handler.h) :

static inline char *
iwe_stream_add_event(char *   stream,     /* Stream of events */
                     char *   ends,       /* End of stream */
                    struct iw_event *iwe, /* Payload */
                     int      event_len)  /* Real size of payload */
{
  /* Check if it's possible */
  if((stream + event_len) < ends) {
      iwe->len = event_len;
      memcpy(stream, (char *) iwe, event_len);
      stream += event_len;
  }
  return stream;
}

IMHO, the compiler should have enough context to know that the reordering is dangerous. Any suggestion to make this simple code more bullet proof is welcomed.

The compiler is free to assume char *stream and struct iw_event *iwe point to separate areas of memory, due to strict aliasing.

Which is true and which is not the problem I'm complaining about.

(Note with hindsight: this code is fine, but Linux's implementation of memcpy was a macro that cast to long * to copy in larger chunks. With a correctly-defined memcpy, gcc -fstrict-aliasing isn't allowed to break this code. But it means you need inline asm to define a kernel memcpy if your compiler doesn't know how turn a byte-copy loop into efficient asm, which was the case for gcc before gcc7)

And Linus Torvald's comment on the above:

Jean Tourrilhes wrote: >

It looks like a compiler bug to me...

Why do you think the kernel uses "-fno-strict-aliasing"?

The gcc people are more interested in trying to find out what can be allowed by the c99 specs than about making things actually work. The aliasing code in particular is not even worth enabling, it's just not possible to sanely tell gcc when some things can alias.

Some users have complained that when the following code is compiled without the -fno-strict-aliasing, the order of the write and memcpy is inverted (which mean a bogus len is mem-copied into the stream).

The "problem" is that we inline the memcpy(), at which point gcc won't care about the fact that it can alias, so they'll just re-order everything and claim it's out own fault. Even though there is no sane way for us to even tell gcc about it.

I tried to get a sane way a few years ago, and the gcc developers really didn't care about the real world in this area. I'd be surprised if that had changed, judging by the replies I have already seen.

I'm not going to bother to fight it.

Linus

http://www.mail-archive.com/[email protected]/msg01647.html:

Type-based aliasing is stupid. It's so incredibly stupid that it's not even funny. It's broken. And gcc took the broken notion, and made it more so by making it a "by-the-letter-of-the-law" thing that makes no sense.

...

I know for a fact that gcc would re-order write accesses that were clearly to (statically) the same address. Gcc would suddenly think that

unsigned long a;

a = 5;
*(unsigned short *)&a = 4;

could be re-ordered to set it to 4 first (because clearly they don't alias - by reading the standard), and then because now the assignment of 'a=5' was later, the assignment of 4 could be elided entirely! And if somebody complains that the compiler is insane, the compiler people would say "nyaah, nyaah, the standards people said we can do this", with absolutely no introspection to ask whether it made any SENSE.