Why is ValueType.GetHashCode() implemented like it is?

alh84001 picture alh84001 · Oct 1, 2010 · Viewed 8.6k times · Source

From ValueType.cs

**Action: Our algorithm for returning the hashcode is a little bit complex. We look 
**        for the first non-static field and get it's hashcode.  If the type has no 
**        non-static fields, we return the hashcode of the type. We can't take the
**        hashcode of a static member because if that member is of the same type as 
**        the original type, we'll end up in an infinite loop.

I got bitten by this today when I was using a KeyValuePair as a key in a Dictionary (it stored xml attribute name (enum) and it's value (string)), and expected for it to have it's hashcode computed based on all its fields, but according to implementation it only considered the Key part.

Example (c/p from Linqpad):

void Main()
{
    var kvp1 = new KeyValuePair<string, string>("foo", "bar");
    var kvp2 = new KeyValuePair<string, string>("foo", "baz");

    // true
    (kvp1.GetHashCode() == kvp2.GetHashCode()).Dump();
}

The first non-static field I guess means the first field in declaratin order, which could also cause trouble when changing variable order in source for whatever reason, and believing it doesn't change the code semantically.

Answer

Hans Passant picture Hans Passant · Oct 1, 2010

The actual implementation of ValueType.GetHashCode() doesn't quite match the comment. It has two versions of the algorithm, fast and slow. It first checks if the struct contains any members of a reference type and if there is any padding between the fields. Padding is empty space in a structure value, created when the JIT compiler aligns the fields. There's padding in a struct that contains bool and int (3 bytes) but no padding when it contains int and int, they fit snugly together.

Without a reference and without padding, it can do the fast version since every bit in the structure value is a bit that belongs to a field value. It simply xors 4 bytes at a time. You'll get a 'good' hash code that considers all the members. Many simple structure types in the .NET framework behave this way, like Point and Size.

Failing that test, it does the slow version, the moral equivalent of reflection. That's what you get, your KeyValuePair<> contains references. And this one only checks the first candidate field, like the comment says. This is surely a perf optimization, avoiding burning too much time.

Yes, nasty detail and not that widely known. It is usually discovered when somebody notices that their collection code sucks mud.

One more excruciating detail: the fast version has a bug that bytes when the structure contains a field of a type decimal. The values 12m and 12.0m are logically equal but they don't have the same bit pattern. GetHashCode() will say that they are not equal. Ouch.