I am writing program that does alot of table lookups. As such, I was perusing the Haskell documentation when I stumbled upon Data.Map
(of course), but also Data.HashMap
and Data.Hashtable
. I am no expert on hashing algorithms and after inspecting the packages they all seem really similar. As such I was wondering:
1: what are the major differences, if any?
2: Which would be the most performant with a high volume of lookups on maps/tables of ~4000 key-value pairs?
1: What are the major differences, if any?
Data.Map.Map
is a balanced binary tree internally, so its time complexity for lookups is O(log n). I believe it's a "persistent" data structure, meaning it's implemented such that mutative operations yield a new copy with only the relevant parts of the structure updated.Data.HashMap.Map
is a Data.IntMap.IntMap
internally, which in turn is implemented as Patricia tree; its time complexity for lookups is O(min(n, W)) where W is the number of bits in an integer. It is also "persistent.". New versions (>= 0.2) use hash array mapped tries. According to the documentation: "Many operations have a average-case complexity of O(log n). The implementation uses a large base (i.e. 16) so in practice these operations are constant time."Data.HashTable.HashTable
is an actual hash table, with time complexity O(1) for lookups. However, it is a mutable data structure -- operations are done in-place -- so you're stuck in the IO
monad if you want to use it.2: Which would be the most performant with a high volume of lookups on maps/tables of ~4000 key-value pairs?
The best answer I can give you, unfortunately, is "it depends." If you take the asymptotic complexities literally, you get O(log 4000) = about 12 for Data.Map
, O(min(4000, 64)) = 64 for Data.HashMap
and O(1) = 1 for Data.HashTable
. But it doesn't really work that way... You have to try them in the context of your code.