1 / 28

Overview of Pairwise Sequence Alignment

Overview of Pairwise Sequence Alignment. 報告者:林哲鋒. Dynamic Programming Applied to optimization problems Useful when Problem can be recursively divided into sub-problems Sub-problems are not independent

Download Presentation

Overview of Pairwise Sequence Alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview of Pairwise Sequence Alignment 報告者:林哲鋒 • Dynamic Programming • Applied to optimization problems • Useful when • Problem can be recursively divided into sub-problems • Sub-problems are not independent • Needleman-Wunsch is a global alignment technique that uses an iterative algorithm and no gap penalty (could extend to fixed gap penalty). • Smith-Waterman is a local alignment technique that uses a recursive algorithm. Smith-Waterman’s algorithm is an extension of Longest Common Substring (LCS) problem and can be generalized to solve both local and global alignment.

  2. 「最長共同子序列」(LCS, Longest Common Subsequence)問題 • 首先我們先解釋什麼是子序列(subsequence) ,所謂子序列就是將一個序列中的一些(可能是零個)字元去掉所得到的序列,例如:pred、sdn、predent等都是 ”president” 的子序列。 • 給定兩序列,最長共同子序列(LCS)問題是決定一個子序列,使得 (1) 該子序列是這兩序列的子序列;(2) 它的長度是最長的。

  3. LCS 例如: 序列一:president 序列二:providence 它的一個LCS為 priden ( PResIDENt PRovIDENce )

  4. LCS 又例如: 序列一:algorithm 序列二:alignment 它的一個LCS為 algm or algt ( ALGorithM ALiGnMent )

  5. How to compute LCS? • 給定兩序列及,令len(i, j)表示LCS之長度,則下列遞迴關係可用來計算len(i, j):

  6. insertion deletion

  7. Output : priden

  8. Identification of Common Molecular Subsequences T. F. SMITE AND M. S. WATERM J. Mol. Bwl. (1981), 147, 195-197

  9. ABSTRACT • The identification of maximally homologous subsequences among sets of long sequences is an important problem. • To find a pair of segments, one from each of two long sequences, such that there is no other pair of segments with greater similarity.

  10. Algorithm • two molecular sequences will be A=a1a2 . . . an, and B=b1b2 . . . bm. • A similarity s(a,b) is given between sequence elements a and b. • Deletions of length k are given weight Wk • Set up a matrix H. First set Hko = Hol = 0 for0 k n &0 l m

  11. Algorithm cont. • Hij is the maximum similarity of two segments ending in ai and bj • These values are obtained from the relationship

  12. Hijfollows by considering the possibilities for ending ,the segments at any ai and bj. • (1)If ai and bj are associated, the similarity is • (2) If ai is at the end of a deletion oflength k, the similarity is • (3) If bj is at the end of a deletion of length I , the similarity is • (4)Finally, a zero is included to prevent calculated negative similarity, indicating no similarity up to ai and bj Hi,j-l ─Wl

  13. The pair of segments with maximum similarity is found by first locating the maximum element of H. • The other matrix elements leading to this maximum value are than sequentially determined with a traceback procedure ending with an element of H equal to zero

  14. in Figure 1. • A match, ai = bj , s(ai,bj) =1 , a mismatch produced a minus one-third.

  15. Local VS global alignment

  16. Global Alignment vs. Local Alignment • global alignment: • local alignment:

  17. Global Alignment vs. Local Alignment global local

  18. Match: 8 Mismatch: -5 Gap symbol: -3 A – C - TA T C A T 8-3+8-3+8 = 18 Local alignment example C G G A T C A T CTTAACT

  19. global alignment • Needleman Wunsch(1970) • Three steps in dynamic programming • Initialization • Matrix fill (scoring) • Traceback (alignment • Match: +8 (w(x, y) = 8, if x = y) • Mismatch: -5 (w(x, y) = -5, if x ≠ y) • Each gap symbol: -3 (w(-,x)=w(x,-)=-3)

  20. C T T A A C – TC G G A T C A T global alignment example1 8 – 5 –5 +8 -5 +8 -3 +8 = 14 C G G A T C A T CTTAACT

  21. C A A T - T G AG A A T C T G C global alignment example2 -5 +8 +8 +8 -3 +8 +8 -5 = 27 G A A T C T G C CAATTGA

  22. Affine gap penalties • A gap of length k is penalized x + k·y. gap-open penalty gap-symbol penalty • Three cases for alignment endings: • ...x...x • ...x...- • ...-...x an aligned pair a deletion an insertion

  23. Affine gap penalties • Let D(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj endingwith a deletion. • Let I(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj endingwith an insertion. • Let S(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj.

  24. Affine gap penalties (A gap of length k is penalized x + k·y.)

  25. Affine gap penalties • Match: +8 (w(x, y) = 8, if x = y) • Mismatch: -5 (w(x, y) = -5, if x ≠ y) • Each gap symbol: -3 (w(-,x)=w(x,-)=-3) • Each gap is charged an extra gap-open penalty: -4. -4 -4 C - - - T T A A C TC G G A T C A - - T +8 -3 -3 -3 +8 -5 +8 -3 -3 +8 = +12 Alignment score: 12 – 4 – 4 = 4

  26. END

More Related