I am trying to solve a dynamic programming problem from Cormem's Introduction to Algorithms 3rd edition (pg 405) which asks the following:
A palindrome is a nonempty string over some alphabet that reads the same forward and backward. Examples of palindromes are all strings of length 1,
civic
,racecar
, andaibohphobia
(fear of palindromes).Give an efficient algorithm to find the longest palindrome that is a subsequence of a given input string. For example, given the input
character
, your algorithm should returncarac
.
Well, I could solve it in two ways:
First solution:
The Longest Palindrome Subsequence (LPS) of a string is simply the Longest Common Subsequence of itself and its reverse. (I've build this solution after solving another related question which asks for the Longest Increasing Subsequence of a sequence). Since it's simply a LCS variant, it also takes O(n²) time and O(n²) memory.
Second solution:
The second solution is a bit more elaborated, but also follows the general LCS template. It comes from the following recurrence:
lps(s[i..j]) =
s[i] + lps(s[i+1]..[j-1]) + s[j], if s[i] == s[j];
max(lps(s[i+1..j]), lps(s[i..j-1])) otherwise
The pseudocode for calculating the length of the lps is the following:
compute-lps(s, n):
// palindromes with length 1
for i = 1 to n:
c[i, i] = 1
// palindromes with length up to 2
for i = 1 to n-1:
c[i, i+1] = (s[i] == s[i+1]) ? 2 : 1
// palindromes with length up to j+1
for j = 2 to n-1:
for i = 1 to n-i:
if s[i] == s[i+j]:
c[i, i+j] = 2 + c[i+1, i+j-1]
else:
c[i, i+j] = max( c[i+1, i+j] , c[i, i+j-1] )
It still takes O(n²) time and memory if I want to effectively construct the lps (because I 'll need all cells on the table). Analysing related problems, such as LIS, which can be solved with approaches other than LCS-like with less memory (LIS is solvable with O(n) memory), I was wondering if it's possible to solve it with O(n) memory, too.
LIS achieves this bound by linking the candidate subsequences, but with palindromes it's harder because what matters here is not the previous element in the subsequence, but the first. Does anyone know if is possible to do it, or are the previous solutions memory optimal?
Here is a very memory efficient version. But I haven't demonstrated that it is always O(n)
memory. (With a preprocessing step it can better than O(n2)
CPU, though O(n2)
is the worst case.)
Start from the left-most position. For each position, keep track of a table of the farthest out points at which you can generate reflected subsequences of length 1, 2, 3, etc. (Meaning that a subsequence to the left of our point is reflected to the right.) For each reflected subsequence we store a pointer to the next part of the subsequence.
As we work our way right, we search from the RHS of the string to the position for any occurrences of the current element, and try to use those matches to improve the bounds we previously had. When we finish, we look at the longest mirrored subsequence and we can easily construct the best palindrome.
Let's consider this for character
.
(0, 11)
which are off the ends of the string.(length, end, start)
are now [(0, 11, 0), (1, 6, 1)]
. (I'll leave out the linked list you need to generate to actually find the palindrome.h
at position 2. We do not improve the bounds [(0, 11, 0), (1, 6, 1)]
.a
at position 3. We improve the bounds to [(0, 11, 0), (1, 6, 1), (2, 5, 3)]
.r
at position 4. We improve the bounds to [(0, 11, 0), (1, 10, 4), (2, 5, 3)]
. (This is where the linked list would be useful.Working through the rest of the list we do not improve that set of bounds.
So we wind up with the longest mirrored list is of length 2. And we'd follow the linked list (that I didn't record in this description to find it is ac
. Since the ends of that list are at positions (5, 3)
we can flip the list, insert character 4
, then append the list to get carac
.
In general the maximum memory that it will require is to store all of the lengths of the maximal mirrored subsequences plus the memory to store the linked lists of said subsequences. Typically this will be a very small amount of memory.
At a classic memory/CPU tradeoff you can preprocess the list once in time O(n)
to generate a O(n)
sized hash of arrays of where specific sequence elements appear. This can let you scan for "improve mirrored subsequence with this pairing" without having to consider the whole string, which should generally be a major saving on CPU for longer strings.