What's a correct and good way to implement __hash__()?

user229898 picture user229898 · May 26, 2010 · Viewed 70.6k times · Source

What's a correct and good way to implement __hash__()?

I am talking about the function that returns a hashcode that is then used to insert objects into hashtables aka dictionaries.

As __hash__() returns an integer and is used for "binning" objects into hashtables I assume that the values of the returned integer should be uniformly distributed for common data (to minimize collisions). What's a good practice to get such values? Are collisions a problem? In my case I have a small class which acts as a container class holding some ints, some floats and a string.

Answer

John Millikin picture John Millikin · May 26, 2010

An easy, correct way to implement __hash__() is to use a key tuple. It won't be as fast as a specialized hash, but if you need that then you should probably implement the type in C.

Here's an example of using a key for hash and equality:

class A:
    def __key(self):
        return (self.attr_a, self.attr_b, self.attr_c)

    def __hash__(self):
        return hash(self.__key())

    def __eq__(self, other):
        if isinstance(other, A):
            return self.__key() == other.__key()
        return NotImplemented

Also, the documentation of __hash__ has more information, that may be valuable in some particular circumstances.