union members may not have constructors, but `std::pair` okay?

Frank picture Frank · Dec 11, 2009 · Viewed 8.9k times · Source

union members may not have destructors or constructors. So I can't template the following class Foo on my own MyClass if MyClass has a constructor:

template<class T>
struct Foo {
  T val;
  Foo(T val_) : val(val_) {}
  size_t hash() const {
    union {T f; size_t s;} u = { val };
    return u.s;
  }
};
struct MyClass {
  bool a;
  double b;
  MyClass(bool a_, double b_) : a(a_), b(b_) {}
};

If I do it anyway I get this error:

member 'MyClass Foo<T>::hash() const 
[with T = MyClass]::<anonymous union>::f' with constructor 
not allowed in union

To get around it I created MyClass with an awkward construction function, which copies the thing around first:

struct MyClass {
  bool a;
  double b;
};
MyClass createMyClass(bool a, double b) {
  MyClass m;
  m.a = a;
  m.b = b;
  return m;
}

But I'm wondering if there is a better way than using this createMyClass function. A constructor would be more efficient, and as a critical component MyClass and Foo<MyClass> are constructed millions of times in my code.

std::pair

I'm also a bit surprised that it is possible to use std::pair in the union:

Foo<std::pair<bool, double> > f2(std::make_pair(true, 3.12));

To my knowledge, std::pair (see code) has a constructor?

Answer

Andreas Brinck picture Andreas Brinck · Dec 11, 2009

EDIT: My original stance on std::pair was wrong, it shouldn't be allowed in a union. For a class to be a valid member of a union it must have a trivial constructor according to standard 9.5.1. The definition of a trivial constructor is this, from paragraph 12.1.5:

If there is no user-declared constructor for class X, a default constructor is implicitly declared. An implicitly-declared default constructor is an inline public member of its class. A constructor is trivial if it is an implicitly-declared default constructor and if:

  • its class has no virtual functions and no virtual base classes, and
  • all the direct base classes of its class have trivial constructors, and
  • for all the nonstatic data members of its class that are of class type (or array thereof), each such class has a trivial constructor

Paragraph 20.2.2.2 states that the following constructor must be available in a pair:

pair(const T1& x, const T2& y);

as soon as this constructor is supplied no default constructor will be implicitly declared.

The funny thing here is that my compiler (Visual Studio 2008) seems to give std::pair special treatment. If I copy the code from the std::pair implementation and place it in my own namespace foo the unions don't work :)

namespace foo {
    template<class _Ty1, class _Ty2> struct pair {
        typedef _Ty1 first_type;
        typedef _Ty2 second_type;
        pair() : first(_Ty1()), second(_Ty2()) {
        }
    }
}

//This doesn't work in VC2008
union Baz {
    foo::pair<bool, double> a;
    int b;
}
//This works in VC2008
union Buz {
    std::pair<bool, double> a;
    int b;
}

Your solution is a common way of getting around this problem. I usually prepend the class name with a C (short for construct) to partially mimic the ordinary constructor syntax, this would in your case become CMyClass(a, b).

As Steve and Matthieu has pointed out you're not using a very good hash function though. Firstly there's no real guarantee (I think, please correct me if I'm wrong) that f and s in the union will even partially occupy the same memory space, and secondly even if they in practice will probably will share the first min(sizeof(s), sizeof(f)) bytes this means that for MyClass you're only hashing on part of the value. In this case you will hash on the value of the bool a, in this case there's two options:

  1. Your compiler uses int as the internal representation for the bool in which case your hash function will only return two values, one for true and one for false.
  2. Your compiler uses char as the internal representation for the bool. In this case the value will probably be padded to at least sizeof(int), either with zeroes in which case you have the same situation as 1. or with whatever random data is on the stack when MyClass is allocated which means you get random hash values for the same input.

If you need to hash by the entire value of T I would copy the data into a temporary buffer like Steve suggests and then use one of the variable-length hash functions discussed here.