Moving from Linear Probing to Quadratic Probing (hash collisons)

rfgamaral picture rfgamaral · Feb 27, 2010 · Viewed 10.2k times · Source

My current implementation of an Hash Table is using Linear Probing and now I want to move to Quadratic Probing (and later to chaining and maybe double hashing too). I've read a few articles, tutorials, wikipedia, etc... But I still don't know exactly what I should do.

Linear Probing, basically, has a step of 1 and that's easy to do. When searching, inserting or removing an element from the Hash Table, I need to calculate an hash and for that I do this:

index = hash_function(key) % table_size;

Then, while searching, inserting or removing I loop through the table until I find a free bucket, like this:

do {
    if(/* CHECK IF IT'S THE ELEMENT WE WANT */) {
        // FOUND ELEMENT

        return;
    } else {
        index = (index + 1) % table_size;
    }
while(/* LOOP UNTIL IT'S NECESSARY */);

As for Quadratic Probing, I think what I need to do is change how the "index" step size is calculated but that's what I don't understand how I should do it. I've seen various pieces of code, and all of them are somewhat different.

Also, I've seen some implementations of Quadratic Probing where the hash function is changed to accommodated that (but not all of them). Is that change really needed or can I avoid modifying the hash function and still use Quadratic Probing?

EDIT: After reading everything pointed out by Eli Bendersky below I think I got the general idea. Here's part of the code at http://eternallyconfuzzled.com/tuts/datastructures/jsw_tut_hashtable.aspx:

15   for ( step = 1; table->table[h] != EMPTY; step++ ) {
16     if ( compare ( key, table->table[h] ) == 0 )
17       return 1;
18 
19     /* Move forward by quadratically, wrap if necessary */
20     h = ( h + ( step * step - step ) / 2 ) % table->size;
21   }

There's 2 things I don't get... They say that quadratic probing is usually done using c(i)=i^2. However, in the code above, it's doing something more like c(i)=(i^2-i)/2

I was ready to implement this on my code but I would simply do:

index = (index + (index^index)) % table_size;

...and not:

index = (index + (index^index - index)/2) % table_size;

If anything, I would do:

index = (index + (index^index)/2) % table_size;

...cause I've seen other code examples diving by two. Although I don't understand why...

1) Why is it subtracting the step?
2) Why is it diving it by 2?

Answer

Matthew Slattery picture Matthew Slattery · Feb 28, 2010

There is a particularly simple and elegant way to implement quadratic probing if your table size is a power of 2:

step = 1;

do {
    if(/* CHECK IF IT'S THE ELEMENT WE WANT */) {
        // FOUND ELEMENT

        return;
    } else {
        index = (index + step) % table_size;
        step++;
    }
} while(/* LOOP UNTIL IT'S NECESSARY */);

Instead of looking at offsets 0, 1, 2, 3, 4... from the original index, this will look at offsets 0, 1, 3, 6, 10... (the ith probe is at offset (i*(i+1))/2, i.e. it's quadratic).

This is guaranteed to hit every position in the hash table (so you are guaranteed to find an empty bucket if there is one) provided the table size is a power of 2.


Here is a sketch of a proof:

  1. Given a table size of n, we want to show that we will get n distinct values of (i*(i+1))/2 (mod n) with i = 0 ... n-1.
  2. We can prove this by contradiction. Assume that there are fewer than n distinct values: if so, there must be at least two distinct integer values for i in the range [0, n-1] such that (i*(i+1))/2 (mod n) is the same. Call these p and q, where p < q.
  3. i.e. (p * (p+1)) / 2 = (q * (q+1)) / 2 (mod n)
  4. => (p2 + p) / 2 = (q2 + q) / 2 (mod n)
  5. => p2 + p = q2 + q (mod 2n)
  6. => q2 - p2 + q - p = 0 (mod 2n)
  7. Factorise => (q - p) (p + q + 1) = 0 (mod 2n)
  8. (q - p) = 0 is the trivial case p = q.
  9. (p + q + 1) = 0 (mod 2n) is impossible: our values of p and q are in the range [0, n-1], and q > p, so (p + q + 1) must be in the range [2, 2n-2].
  10. As we are working modulo 2n, we must also deal with the tricky case where both factors are non-zero, but multiply to give 0 (mod 2n):
    • Observe that the difference between the two factors (q - p) and (p + q + 1) is (2p + 1), which is an odd number - so one of the factors must be even, and the other must be odd.
    • (q - p) (p + q + 1) = 0 (mod 2n) => (q - p) (p + q + 1) is divisible by 2n. If n (and hence 2n) is a power of 2, this requires the even factor to be a multiple of 2n (because all of the prime factors of 2n are 2, whereas none of the prime factors of our odd factor are).
    • But (q - p) has a maximum value of n-1, and (p + q + 1) has a maximum value of 2n-2 (as seen in step 9), so neither can be a multiple of 2n.
    • So this case is impossible as well.
  11. Therefore the assumption that there are fewer than n distinct values (in step 2) must be false.

(If the table size is not a power of 2, this falls apart at step 10.)