I need to find a dynamic programming algorithm to solve this problem. I tried but couldn't figure it out. Here is the problem:
You are given a string of n characters s[1...n], which you believe to be a corrupted text document in which all punctuation has vanished (so that it looks something like "itwasthebestoftimes..."). You wish to reconstruct the document using a dictionary, which is available in the form of a Boolean function dict(*) such that, for any string w, dict(w) has value 1 if w is a valid word, and has value 0 otherwise.
Let the length of your compacted document be N.
Let b(n) be a boolean: true if the document can be split into words starting from position n in the document.
b(N) is true (since the empty string can be split into 0 words). Given b(N), b(N - 1), ... b(N - k), you can construct b(N - k - 1) by considering all words that start at character N - k - 1. If there's any such word, w, with b(N - k - 1 + len(w)) set, then set b(N - k - 1) to true. If there's no such word, then set b(N - k - 1) to false.
Eventually, you compute b(0) which tells you if the entire document can be split into words.
In pseudo-code:
def try_to_split(doc):
N = len(doc)
b = [False] * (N + 1)
b[N] = True
for i in range(N - 1, -1, -1):
for word starting at position i:
if b[i + len(word)]:
b[i] = True
break
return b
There's some tricks you can do to get 'word starting at position i' efficient, but you're asked for an O(N^2) algorithm, so you can just look up every string starting at i in the dictionary.
To generate the words, you can either modify the above algorithm to store the good words, or just generate it like this:
def generate_words(doc, b, idx=0):
length = 1
while true:
assert b(idx)
if idx == len(doc): return
word = doc[idx: idx + length]
if word in dictionary and b(idx + length):
output(word)
idx += length
length = 1
Here b is the boolean array generated from the first part of the algorithm.