How to implement a dictionary (Trie vs HashTable and important issues)?

Denys S. picture Denys S. · Jan 14, 2011 · Viewed 22.5k times · Source

I've ran across several questions and articles saying that dictionary implementation in java is done best using tries. But most of them didn't address important issues, as far as I saw it. So, next is a real world task:

Let's assume that I need to implement a dictionary (let's say something like Lingvo, but simpler) using java. For my particular task it is needed to store word definitions and to perform fast dictionary lookup.

Please, address next questions:

  • What data structure should I use then (Trie or HashTable)?
  • How should it(search, datastruct) be organised if I need the dictionary to be case insensitive?
  • What if I want it(search, dictionary) to be case sensitive?

P.S.: Code examples are highly appreciated. :)

Thanks for answers in advance.

UPDATE:If we're talking about standard DS implementations in java, is it true that HashTable will be the best one for this particular task? Why not HashMap, TreeMap or LinkedHashMap?

Answer

Konrad Rudolph picture Konrad Rudolph · Jan 14, 2011

I want to address just one point in your question:

A trie is not a general-purpose dictionary data structure. The reason is that a trie is a specialized search tree for (sub)string search. Generally, you will be more interested in general search trees, e.g. binary search trees or B-trees.

All these implementations rely on an ordering of the dictionary elements, and all of them have a logarithmic average-case and worst-case runtime for common operations.

A hash table, by contrast, does not require a relative ordering of the elements. Instead, it requires that elements are hashable and equality comparable. The worst-case characteristic of common hash table characteristics is much worse than for trees, namely linear in the number of elements.

However, with a bit of care the average case for hash tables operations can be made constant (i.e. independent of the container size). What’s more, it can be proven that slower operations are exceedingly rare.

In practice, this means that except for very specialized use-cases, hash tables beat tree-based dictionaries hands down.

The downside to this is that hash tables impose an arbitrary-seeming order on its elements. If you are interested in getting the items from your dictionary in sorted order, hash tables are not for you.

(There are other interesting implementations of dictionaries, e.g. skip lists which rival search trees and probabilistic implementations like the Bloom filter.)

A trie-based implementation can only be used if you are dealing with a dictionary of string values, in which case it is actually often a good choice, in particular if many strings in the dictionary share common prefixes and are rather short.