I try to score the already-aligned sequences. Let say
seq1 = 'PAVKDLGAEG-ASDKGT--SHVVY----------TI-QLASTFE'
seq2 = 'PAVEDLGATG-ANDKGT--LYNIYARNTEGHPRSTV-QLGSTFE'
with given parameters
substitution matrix : blosum62
gap open penalty : -5
gap extension penalty : -1
I did look through the biopython cookbook but all i can get is substitution matrix blogsum62 but I feel that it must have someone already implemented this kind of library.
So can anyone suggest any libraries or shortest code that can solve my problem?
Thx in advance
Jessada,
The Blosum62 matrix (note the spelling ;) is in Bio.SubsMat.MatrixInfo and is a dictionary with tuples resolving to scores (so ('A', 'A')
is worth 4 pts). It doesn't have the gaps, and it's only one triangle of the matrix (so it might ahve ('T', 'A') but not ('A', 'T'). There are some helper functions in Biopython, including some in Bio.Pairwise, but this is what I came up with as an answer:
from Bio.SubsMat import MatrixInfo
def score_match(pair, matrix):
if pair not in matrix:
return matrix[(tuple(reversed(pair)))]
else:
return matrix[pair]
def score_pairwise(seq1, seq2, matrix, gap_s, gap_e):
score = 0
gap = False
for i in range(len(seq1)):
pair = (seq1[i], seq2[i])
if not gap:
if '-' in pair:
gap = True
score += gap_s
else:
score += score_match(pair, matrix)
else:
if '-' not in pair:
gap = False
score += score_match(pair, matrix)
else:
score += gap_e
return score
seq1 = 'PAVKDLGAEG-ASDKGT--SHVVY----------TI-QLASTFE'
seq2 = 'PAVEDLGATG-ANDKGT--LYNIYARNTEGHPRSTV-QLGSTFE'
blosum = MatrixInfo.blosum62
score_pairwise(seq1, seq2, blosum, -5, -1)
Which returns 82 for your alignment. There's almost certianly prettier ways to do all of this, but that should be a good start.