I've tried my best reading most of the literature on this, and still haven't understood anything about how the failure function used in KMP algorithm is constructed. I've been referring mostly to http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=stringSearching tutorial which most of the people consider excellent. However, I still have not understood it. I'd be thankful if you could take the pain of giving me a simpler and easy to understand explanation on it.
The failure function actually tells us this: if you matched X characters of a string, what is the longest suffix of such string such that it's also a prefix of a search string.
You are asking how it's built, the approach is quite straightforward.
If you add a new character at the end of a string, that is you are building f[x], and if it matches with character at position f[x-1], then f[x] is simply f[x-1]+1.
In the other cases where it doesn't match, you try to find smaller and smaller suffixes and check if they match.
For example, you have a word "accadaccac"
for which you are building a failure function and you just added the letter 'c'
. Let's say you are building a failure function for the last letter, the letter 'c'
.
"acca"
with the prefix "acca"
, and now you add the letter 'c'
, it doesn't match with the letter 'd'
succeeding prefix "acca"
."acca"
which is also a prefix of "accadaccac"
, but is smaller than "acca". The answer to that question is f[length("acca")-1], or f[3], which is f[3] = 1, because suffix of length 1 (just the letter 'a'
) is also a prefix of a search string.'c'
matches with the character on the position 1, and voila, it matches, so now you know f[9] = f[f[8]-1]+1 = 2.I hope this will help you. Good luck! :)