I am trying to find the longest common subsequence of 3 or more strings. The Wikipedia article has a great description of how to do this for 2 strings, but I'm a little unsure of how to extend this to 3 or more strings.
There are plenty of libraries for finding the LCS of 2 strings, so I'd like to use one of them if possible. If I have 3 strings A, B and C, is it valid to find the LCS of A and B as X, and then find the LCS of X and C, or is this the wrong way to do it?
I've implemented it in Python as follows:
import difflib
def lcs(str1, str2):
sm = difflib.SequenceMatcher()
sm.set_seqs(str1, str2)
matching_blocks = [str1[m.a:m.a+m.size] for m in sm.get_matching_blocks()]
return "".join(matching_blocks)
print reduce(lcs, ['abacbdab', 'bdcaba', 'cbacaa'])
This outputs "ba", however it should be "baa".
Just generalize the recurrence relation.
For three strings:
dp[i, j, k] = 1 + dp[i - 1, j - 1, k - 1] if A[i] = B[j] = C[k]
max(dp[i - 1, j, k], dp[i, j - 1, k], dp[i, j, k - 1]) otherwise
Should be easy to generalize to more strings from this.