How does the Failure function used in KMP algorithm work?

hytriutucx picture hytriutucx · Jul 1, 2012 · Viewed 8.9k times · Source

I've tried my best reading most of the literature on this, and still haven't understood anything about how the failure function used in KMP algorithm is constructed. I've been referring mostly to http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=stringSearching tutorial which most of the people consider excellent. However, I still have not understood it. I'd be thankful if you could take the pain of giving me a simpler and easy to understand explanation on it.

Answer

Dominik Gleich picture Dominik Gleich · Jul 1, 2012

The failure function actually tells us this: if you matched X characters of a string, what is the longest suffix of such string such that it's also a prefix of a search string.

You are asking how it's built, the approach is quite straightforward.

If you add a new character at the end of a string, that is you are building f[x], and if it matches with character at position f[x-1], then f[x] is simply f[x-1]+1.

In the other cases where it doesn't match, you try to find smaller and smaller suffixes and check if they match.

For example, you have a word "accadaccac" for which you are building a failure function and you just added the letter 'c'. Let's say you are building a failure function for the last letter, the letter 'c'.

  • First you check the failure function of the previous letter, its failure function was 4 because you can match suffix "acca" with the prefix "acca", and now you add the letter 'c', it doesn't match with the letter 'd' succeeding prefix "acca".
  • So you backtrack, to the last good suffix. You are now searching for a suffix of "acca" which is also a prefix of "accadaccac", but is smaller than "acca". The answer to that question is f[length("acca")-1], or f[3], which is f[3] = 1, because suffix of length 1 (just the letter 'a') is also a prefix of a search string.
  • And now you can try if the 'c' matches with the character on the position 1, and voila, it matches, so now you know f[9] = f[f[8]-1]+1 = 2.

I hope this will help you. Good luck! :)