I was just wondering why is that primes are used in a class's hashCode()
method? For example, when using Eclipse to generate my hashCode()
method there is always the prime number 31
used:
public int hashCode() {
final int prime = 31;
//...
}
References:
Here is a good primer on Hashcode and article on how hashing works that I found (C# but the concepts are transferrable): Eric Lippert's Guidelines and rules for GetHashCode()
Prime numbers are chosen to best distribute data among hash buckets. If the distribution of inputs is random and evenly spread, then the choice of the hash code/modulus does not matter. It only has an impact when there is a certain pattern to the inputs.
This is often the case when dealing with memory locations. For example, all 32-bit integers are aligned to addresses divisible by 4. Check out the table below to visualize the effects of using a prime vs. non-prime modulus:
Input Modulo 8 Modulo 7
0 0 0
4 4 4
8 0 1
12 4 5
16 0 2
20 4 6
24 0 3
28 4 0
Notice the almost-perfect distribution when using a prime modulus vs. a non-prime modulus.
However, although the above example is largely contrived, the general principle is that when dealing with a pattern of inputs, using a prime number modulus will yield the best distribution.