Why do we use linear probing in hash tables when there is separate chaining linked with lists?

Adilli Adil picture Adilli Adil · May 23, 2014 · Viewed 21.8k times · Source

I recently learned about different methods to deal with collisions in hash tables and saw that the separate chaining with linked lists is always more time efficient than linear probing. For space efficiency, we allocate a predefined memory for linear probing which later on we might not use, but for separate chaining we use memory dynamically.

Is separate chaining with linked list more efficient than linear probing? If so, why do we then use linear probing at all?

Answer

templatetypedef picture templatetypedef · May 23, 2014

I'm surprised that you saw chained hashing to be faster than linear probing - in practice, linear probing is typically significantly faster than chaining. This is primarily due to locality of reference, since the accesses performed in linear probing tend to be closer in memory than the accesses performed in chained hashing.

There are other wins in linear probing. For example, insertions into a linear probing hash table don't require any new allocations (unless you're rehashing the table), so in applications like network routers where memory is scarce, it's nice to know that once the table is set up, the elements can be placed into it with no risk of a malloc fail.

One weakness of linear probing is that, with a bad choice of hash function, primary clustering can cause the performance of the table to degrade significantly. While chained hashing can still suffer from bad hash functions, it's less sensitive to elements with nearby hash codes, which don't adversely impact the runtime. Theoretically, linear probing only gives expected O(1) lookups if the hash functions are 5-independent or if there's sufficient entropy in the keys. There are many ways to address this, since as using the Robin Hood hashing technique or hopscotch hashing, both of which have significantly better worst-cases than vanilla linear probing.

The other weakness of linear probing is that its performance significantly degrades as the load factor approaches 1. You can address this either by rehashing periodically or by using the Robin Hood hashing technique described above.

Hope this helps!