Accessing C union members via pointers

Dara Hazeghi picture Dara Hazeghi · May 29, 2013 · Viewed 8.8k times · Source

Does accessing union members via a pointer, as in the example below, result in undefined behavior in C99? The intent seems clear enough, but I know that there are some restrictions regarding aliasing and unions.

union { int i; char c; } u;

int  *ip = &u.i;
char *ic = &u.c;

*ip = 0;
*ic = 'a';
printf("%c\n", u.c);

Answer

paxdiablo picture paxdiablo · May 29, 2013

It is unspecified (subtly different from undefined) behaviour to access a union by any element other than the one that was last written. That's detailed in C99 annex J:

The following are unspecified:
   :
   The value of a union member other than the last one stored into (6.2.6.1).

However, since you are writing to c via the pointer, then reading c, this particular example is well defined. It does not matter how you write to the element:

u.c = 'a';        // direct write.
*(&(u.c)) = 'a';  // variation on yours, writing through element pointer.
(&u)->c = 'a';    // writing through structure pointer.

There is one issue that has been raised in comments which seems to contradict that, at least seemingly. User davmac provides sample code:

// Compile with "-O3 -std=c99" eg:
//  clang -O3 -std=c99 test.c
//  gcc -O3 -std=c99 test.c
// On clang v3.5.1, output is "123"
// On gcc 4.8.4, output is "1073741824"
//
// Different outputs, so either:
// * program invokes undefined behaviour; both compilers are correct OR
// * compiler vendors interpret standard differently OR
// * one compiler or the other has a bug

#include <stdio.h>

union u
{
    int i;
    float f;
};

int someFunc(union u * up, float *fp)
{
    up->i = 123;
    *fp = 2.0;     // does this set the union member?
    return up->i;  // then this should not return 123!
}

int main(int argc, char **argv)
{
    union u uobj;
    printf("%d\n", someFunc(&uobj, &uobj.f));
    return 0;
}

which outputs different values on different compilers. However, I believe that this is because it is actually violating the rules here because it writes to member f then reads member i and, as shown in Annex J, that's unspecified.

There is a footnote 82 in 6.5.2.3 which states:

If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type.

However, since this seems to go against the Annex J comment and it's a footnote to the section dealing with expressions of the form x.y, it may not apply to accesses via a pointer.

One of the major reasons why aliasing is supposed to be strict is to allow the compiler more scope for optimisation. To that end, the standard dictates that treating memory of a different type to that written is unspecified.

By way of example, consider the function provided:

int someFunc(union u * up, float *fp)
{
    up->i = 123;
    *fp = 2.0;     // does this set the union member?
    return up->i;  // then this should not return 123!
}

The implementation is free to assume that, because you're not supposed to alias memory, up->i and *fp are two distinct objects. So it's free to assume that you're not changing the value of up->i after you set it to 123 so it can simply return 123 without looking at the actual variable contents again.

If instead, you changed the pointer setting statement to:

up->f = 2.0;

then that would make footnote 82 applicable and the returned value would be a re-interpretation of the float as an integer.

The reason why I don't think that's an issue for the question is because your writing then reading the same type, hence aliasing rules don't come into play.


It's interesting to note that the unspecified behaviour is caused not by the function itself, but by calling it thus:

union u up;
int x = someFunc (&u, &(up.f)); // <- aliasing here

If you were instead to call it so:

union u up;
float down;
int x = someFunc (&u, &down); // <- no aliasing

that would not be a problem.