Manacher's algorithm (algorithm to find longest palindrome substring in linear time)

user678392 picture user678392 · May 6, 2012 · Viewed 39.9k times · Source

After spending about 6-8 hours trying to digest the Manacher's algorithm, I am ready to throw in the towel. But before I do, here is one last shot in the dark: can anyone explain it? I don't care about the code. I want somebody to explain the ALGORITHM.

Here seems to be a place that others seemed to enjoy in explaining the algorithm: http://www.leetcode.com/2011/11/longest-palindromic-substring-part-ii.html

I understand why you would want to transform the string, say, 'abba' to #a#b#b#a# After than I'm lost. For example, the author of the previously mentioned website says the key part of the algorithm is:

                      if P[ i' ] ≤ R – i,
                      then P[ i ] ← P[ i' ]
                      else P[ i ] ≥ P[ i' ]. (Which we have to expand past 
                      the right edge (R) to find P[ i ])

This seems wrong, because he/she says at one point that P[i] equals 5 when P[i'] = 7 and P[i] is not less or equal to R - i.

If you are not familiar with the algorithm, here are some more links: http://tristan-interview.blogspot.com/2011/11/longest-palindrome-substring-manachers.html (I've tried this one, but the terminology is awful and confusing. First, some things are not defined. Also, too many variables. You need a checklist to recall what variable is referring to what.)

Another is: http://www.akalin.cx/longest-palindrome-linear-time (good luck)

The basic gist of the algorithm is to find the longest palindrome in linear time. It can be done in O(n^2) with a minimum to medium amount of effort. This algorithm is supposed to be quite "clever" to get it down to O(n).

Answer

Vaughn Cato picture Vaughn Cato · May 6, 2012

I agree that the logic isn't quite right in the explanation of the link. I give some details below.

Manacher's algorithm fills in a table P[i] which contains how far the palindrome centered at i extends. If P[5]=3, then three characters on either side of position five are part of the palindrome. The algorithm takes advantage of the fact that if you've found a long palindrome, you can fill in values of P on the right side of the palindrome quickly by looking at the values of P on the left side, since they should mostly be the same.

I'll start by explaining the case you were talking about, and then I'll expand this answer as needed.

R indicates the index of the right side of the palindrome centered at C. Here is the state at the place you indicated:

C=11
R=20
i=15
i'=7
P[i']=7
R-i=5

and the logic is like this:

if P[i']<=R-i:  // not true
else: // P[i] is at least 5, but may be greater

The pseudo-code in the link indicates that P[i] should be greater than or equal to P[i'] if the test fails, but I believe it should be greater than or equal to R-i, and the explanation backs that up.

Since P[i'] is greater than R-i, the palindrome centered at i' extends past the palindrome centered at C. We know the palindrome centered at i will be at least R-i characters wide, because we still have symmetry up to that point, but we have to search explicitly beyond that.

If P[i'] had been no greater than R-i, then the largest palindrome centered at i' is within the largest palindrome centered at C, so we would have known that P[i] couldn't be any larger than P[i']. If it was, we would have a contradiction. It would mean that we would be able to extend the palindrome centered at i beyond P[i'], but if we could, then we would also be able to extend the palindrome centered at i' due to the symmetry, but it was already supposed to be as large as possible.

This case is illustrated previously:

C=11
R=20
i=13
i'=9
P[i']=1
R-i=7

In this case, P[i']<=R-i. Since we are still 7 characters away from the edge of the palindrome centered at C, we know that at least 7 characters around i are the same as the 7 characters around i'. Since there was only a one character palindrome around i', there is a one character palindrome around i as well.

j_random_hacker noticed that the logic should be more like this:

if P[i']<R-i then
  P[i]=P[i']
else if P[i']>R-i then
  P[i]=R-i
else P[i]=R-i + expansion

If P[i'] < R-i, then we know that P[i]==P[i'], since we're still inside the palindrome centered at C.

If P[i'] > R-i, then we know that P[i]==R-i, because otherwise the palindrome centered at C would have extended past R.

So the expansion is really only necessary in the special case where P[i']==R-i, so we don't know if the palindrome at P[i] may be longer.

This is handled in the actual code by setting P[i]=min(P[i'],R-i) and then always expanding. This way of doing it doesn't increase the time complexity, because if no expansion is necessary, the time taken to do the expansion is constant.