Build trie faster

Bruce picture Bruce · Sep 23, 2013 · Viewed 10.5k times · Source

I'm making an mobile app which needs thousands of fast string lookups and prefix checks. To speed this up, I made a Trie out of my word list, which has about 180,000 words.

Everything's great, but the only problem is that building this huge trie (it has about 400,000 nodes) takes about 10 seconds currently on my phone, which is really slow.

Here's the code that builds the trie.

public SimpleTrie makeTrie(String file) throws Exception {
    String line;
    SimpleTrie trie = new SimpleTrie();

    BufferedReader br = new BufferedReader(new FileReader(file));
    while( (line = br.readLine()) != null) {
        trie.insert(line);
    }
    br.close();

    return trie;
}

The insert method which runs on O(length of key)

public void insert(String key) {
    TrieNode crawler = root;
    for(int level=0 ; level < key.length() ; level++) {
        int index = key.charAt(level) - 'A';
        if(crawler.children[index] == null) {
            crawler.children[index] = getNode();
        }
        crawler = crawler.children[index];
    }
    crawler.valid = true;
}

I'm looking for intuitive methods to build the trie faster. Maybe I build the trie just once on my laptop, store it somehow to the disk, and load it from a file in the phone? But I don't know how to implement this.

Or are there any other prefix data structures which will take less time to build, but have similar lookup time complexity?

Any suggestions are appreciated. Thanks in advance.

EDIT

Someone suggested using Java Serialization. I tried it, but it was very slow with this code:

public void serializeTrie(SimpleTrie trie, String file) {
        try {
            ObjectOutput out = new ObjectOutputStream(new BufferedOutputStream(new FileOutputStream(file)));
            out.writeObject(trie);
            out.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    public SimpleTrie deserializeTrie(String file) {
        try {
            ObjectInput in = new ObjectInputStream(new BufferedInputStream(new FileInputStream(file)));
            SimpleTrie trie = (SimpleTrie)in.readObject();
            in.close();
            return trie;
        } catch (IOException | ClassNotFoundException e) {
            e.printStackTrace();
            return null;
        }
    }

Can this above code be made faster?

My trie: http://pastebin.com/QkFisi09

Word list: http://www.isc.ro/lists/twl06.zip

Android IDE used to run code: http://play.google.com/store/apps/details?id=com.jimmychen.app.sand

Answer

Mikhail Korobov picture Mikhail Korobov · Sep 27, 2013

Double-Array tries are very fast to save/load because all data is stored in linear arrays. They are also very fast to lookup, but the insertions can be costly. I bet there is a Java implementation somewhere.

Also, if your data is static (i.e. you don't update it on phone) consider DAFSA for your task. It is one of the most efficient data structures for storing words (must be better than "standard" tries and radix tries both for size and for speed, better than succinct tries for speed, often better than succinct tries for size). There is a good C++ implementation: dawgdic - you can use it to build DAFSA from command line and then use a Java reader for the resulting data structure (example implementation is here).