What is the difference between chaining and probing in hash tables?

th3an0maly picture th3an0maly · Apr 10, 2016 · Viewed 14.5k times · Source

How do they work? What is their major differences? What are their respective trade-offs? What are their types (if any)? When is one preferred to another (if at all)?

PS: I've already gone through Anagrams - Hashing with chaining and probing in C and Why do we use linear probing in Hash tables when there is separate chaining linked with lists?, but neither seems to draw a contrast between the two methods.

Answer

Andrew picture Andrew · Apr 10, 2016

Chaining and open-addressing (a simple implementation of which is based on linear-probing) are used in Hashtables to resolve collisions. A collision happens whenever the hash function for two different keys points to the same location to store the value.

In order to store both values, with different keys that would have been stored in the same location, chaining and open-addressing take different approaches: while chaining resolves the conflict by created a linked list of values with the same hash; open-addressing tries to attempts to find a different location to store the values with the same hash.

An interesting alternative to linear-probing for open-addressing conflict resolution is what is known as double-hashing.

The main difference that arises is in the speed of retrieving the value being hashed under different conditions.

Let's start with chaining as collision resolution. Notice here that after calculating the hash function for Lisa, you need to get the first element from the list to get the value required. Therefore, you access the pointer to the head of the list and then the value: 2 operations.

On the other hand, with open-addressing, such as linear-probing, when there is no collision, you immediately obtain the value you are seeking. That is, you require only 1 operation, which is faster.

However, when your HashTable starts to get full, and you have a high load factor, due to collisions happening more often, probing will require you to check more Hashtable locations before you find the actual value that you want. At about a load factor of 0.8, chaining starts to become more efficient due to multiple collisions: you would have to probe a lot of empty cells in order to find the actual value you want with probing, while with chaining you have a list of values that have the same hash key.

This is just a quick overview, as the actual data, the distribution of the keys, the hash function used and the precise implementation of collision resolution will make a difference in your actual speed.