Hash tables v self-balancing search trees

Vaibhav Bajpai picture Vaibhav Bajpai · Jul 16, 2010 · Viewed 10.1k times · Source

I am curious to know what is the reasoning that could overweighs towards using a self-balancing tree technique to store items than using a hash table.

I see that hash tables cannot maintain the insertion-order, but I could always use a linked list on top to store the insertion-order sequence.

I see that for small number of values, there is an added cost of of the hash-function, but I could always save the hash-function together with the key for faster lookups.

I understand that hash tables are difficult to implement than the straight-forward implementation of a red-black tree, but in a practical implementation wouldn't one be willing to go an extra mile for the trouble?

I see that with hash tables it is normal for collisions to occur, but with open-addressing techniques like double hashing that allow to save the keys in the hash table itself, hasn't the problem been reduced to the effect of not tipping the favor towards red black trees for such implementations?

I am curious if I am strictly missing a disadvantage of hash table that still makes red black trees quite viable data structure in practical applications (like filesystems, etc.).

Answer

unbeli picture unbeli · Jul 16, 2010

Here is what I can think of:

  1. There are kinds of data which cannot be hashed (or is too expensive to hash), therefore cannot be stored in hash tables.
  2. Trees keep data in the order you need (sorted), not insertion order. You can't (effectively) do that with hash table, even if you run a linked list through it.
  3. Trees have better worst-case performace