Split string into words

JP19 picture JP19 · Jan 21, 2011 · Viewed 10.8k times · Source

I am looking for the most efficient algorithm to form all possible combinations of words from a string. For example:

Input String: forevercarrot

Output:

forever carrot
forever car rot
for ever carrot
for ever car rot

(All words should be from a dictionary).

I can think of a brute force approach. (find all possible substrings and match) but what would be better ways?

Answer

9000 picture 9000 · Jan 21, 2011

Use a prefix tree for your list of known words. Probably libs like myspell already do so. Try using a ready-made one.

Once you found a match (e.g. 'car'), split your computation: one branch starts to look for a new word ('rot'), another continues to explore variants of current beginning ('carrot').

Effectively you maintain a queue of pairs (start_position, current_position) of offsets into your string every time you split the computation. Several threads can pop from this queue in parallel and try to continue a word that starts from start_position and is already known up to current_position of the pair, but does not end there. When a word is found, it is reported and another pair is popped from the queue. When it's impossible, no result is generated. When a split occurs, a new pair is added to the end of the queue. Initially the queue contains a (0,0).