Berkeleydb - B-Tree versus Hash Table

rakeshr picture rakeshr · Nov 9, 2010 · Viewed 8.8k times · Source

I am trying to understand what should drive the choice of the access method while using a BerkeleyDB : B-Tree versus HashTable. A Hashtable provides O(1) lookup but inserts are expensive (using Linear/Extensible hashing we get amortized O(1) for insert). But B-Trees provide log N (base B) lookup and insert times. A B-Tree can also support range queries and allow access in sorted order.

  1. Apart from these considerations what else should be factored in?
  2. If I don't need to support range queries can I just use a Hashtable access method?

Answer

hyc picture hyc · Aug 23, 2012

When your data sets get very large, B-trees are still better because the majority of the internal metadata may still fit in cache. Hashes, by their nature (uniform random distribution of data) are inherently cache-unfriendly. I.e., once the total size of the data set exceeds the working memory size, hash performance drops off a cliff while B-tree performance degrades gracefully (logarithmically, actually).