350 likes | 1k Views
Overview of Pairwise Sequence Alignment. 報告者:林哲鋒. Dynamic Programming Applied to optimization problems Useful when Problem can be recursively divided into sub-problems Sub-problems are not independent
E N D
Overview of Pairwise Sequence Alignment 報告者:林哲鋒 • Dynamic Programming • Applied to optimization problems • Useful when • Problem can be recursively divided into sub-problems • Sub-problems are not independent • Needleman-Wunsch is a global alignment technique that uses an iterative algorithm and no gap penalty (could extend to fixed gap penalty). • Smith-Waterman is a local alignment technique that uses a recursive algorithm. Smith-Waterman’s algorithm is an extension of Longest Common Substring (LCS) problem and can be generalized to solve both local and global alignment.
「最長共同子序列」(LCS, Longest Common Subsequence)問題 • 首先我們先解釋什麼是子序列(subsequence) ,所謂子序列就是將一個序列中的一些(可能是零個)字元去掉所得到的序列,例如:pred、sdn、predent等都是 ”president” 的子序列。 • 給定兩序列,最長共同子序列(LCS)問題是決定一個子序列,使得 (1) 該子序列是這兩序列的子序列;(2) 它的長度是最長的。
LCS 例如: 序列一:president 序列二:providence 它的一個LCS為 priden ( PResIDENt PRovIDENce )
LCS 又例如: 序列一:algorithm 序列二:alignment 它的一個LCS為 algm or algt ( ALGorithM ALiGnMent )
How to compute LCS? • 給定兩序列及,令len(i, j)表示LCS之長度,則下列遞迴關係可用來計算len(i, j):
insertion deletion
Identification of Common Molecular Subsequences T. F. SMITE AND M. S. WATERM J. Mol. Bwl. (1981), 147, 195-197
ABSTRACT • The identification of maximally homologous subsequences among sets of long sequences is an important problem. • To find a pair of segments, one from each of two long sequences, such that there is no other pair of segments with greater similarity.
Algorithm • two molecular sequences will be A=a1a2 . . . an, and B=b1b2 . . . bm. • A similarity s(a,b) is given between sequence elements a and b. • Deletions of length k are given weight Wk • Set up a matrix H. First set Hko = Hol = 0 for0 k n &0 l m
Algorithm cont. • Hij is the maximum similarity of two segments ending in ai and bj • These values are obtained from the relationship
Hijfollows by considering the possibilities for ending ,the segments at any ai and bj. • (1)If ai and bj are associated, the similarity is • (2) If ai is at the end of a deletion oflength k, the similarity is • (3) If bj is at the end of a deletion of length I , the similarity is • (4)Finally, a zero is included to prevent calculated negative similarity, indicating no similarity up to ai and bj Hi,j-l ─Wl
The pair of segments with maximum similarity is found by first locating the maximum element of H. • The other matrix elements leading to this maximum value are than sequentially determined with a traceback procedure ending with an element of H equal to zero
in Figure 1. • A match, ai = bj , s(ai,bj) =1 , a mismatch produced a minus one-third.
Global Alignment vs. Local Alignment • global alignment: • local alignment:
Global Alignment vs. Local Alignment global local
Match: 8 Mismatch: -5 Gap symbol: -3 A – C - TA T C A T 8-3+8-3+8 = 18 Local alignment example C G G A T C A T CTTAACT
global alignment • Needleman Wunsch(1970) • Three steps in dynamic programming • Initialization • Matrix fill (scoring) • Traceback (alignment • Match: +8 (w(x, y) = 8, if x = y) • Mismatch: -5 (w(x, y) = -5, if x ≠ y) • Each gap symbol: -3 (w(-,x)=w(x,-)=-3)
C T T A A C – TC G G A T C A T global alignment example1 8 – 5 –5 +8 -5 +8 -3 +8 = 14 C G G A T C A T CTTAACT
C A A T - T G AG A A T C T G C global alignment example2 -5 +8 +8 +8 -3 +8 +8 -5 = 27 G A A T C T G C CAATTGA
Affine gap penalties • A gap of length k is penalized x + k·y. gap-open penalty gap-symbol penalty • Three cases for alignment endings: • ...x...x • ...x...- • ...-...x an aligned pair a deletion an insertion
Affine gap penalties • Let D(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj endingwith a deletion. • Let I(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj endingwith an insertion. • Let S(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj.
Affine gap penalties (A gap of length k is penalized x + k·y.)
Affine gap penalties • Match: +8 (w(x, y) = 8, if x = y) • Mismatch: -5 (w(x, y) = -5, if x ≠ y) • Each gap symbol: -3 (w(-,x)=w(x,-)=-3) • Each gap is charged an extra gap-open penalty: -4. -4 -4 C - - - T T A A C TC G G A T C A - - T +8 -3 -3 -3 +8 -5 +8 -3 -3 +8 = +12 Alignment score: 12 – 4 – 4 = 4