1 / 28

Alignment II Dynamic Programming

Alignment II Dynamic Programming. Pair-wise sequence alignments. A: C A T - T C A - C | | | | | B: C - T C G C A G C. Idea: Display one sequence above another with spaces inserted in both to reveal similarity. Two types of alignment. S = CTGTCGCTGCACG T = TGCCGTG.

zwi
Download Presentation

Alignment II Dynamic Programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Alignment IIDynamic Programming

  2. Pair-wise sequence alignments A:C A T - T C A - C | | | | | B:C - T C G C A G C Idea: Display one sequence above another with spaces inserted in both to reveal similarity

  3. Two types of alignment S =CTGTCGCTGCACG T =TGCCGTG Global alignment Local alignment CTGTCG-CTGCACG -TGC-CG-TG---- CTGTCGCTGCACG-- -------TGC-CGTG

  4. Global alignment: Scoring CTGTCG-CTGCACG -TGC-CG-TG---- Reward for matches:  Mismatch penalty:  Space penalty:  score(A) = w – x - y w = #matchesx = #mismatches y = #spaces

  5. Global alignment: Scoring Reward for matches: 10 Mismatch penalty: 2 Space penalty: 5 C T G T C G – C T G C - T G C – C G – T G - -5 10 10-2 -5 -2 -5 -5 10 10 -5 Total = 11

  6. Optimum Alignment • The score of an alignment is a measure of its quality • Optimum alignment problem: Given a pair of sequences X and Y, find an alignment (global or local) with maximum score • The similarity between X and Y, denoted sim(X,Y), is the maximum score of an alignment of X and Y

  7. Alignment algorithms • Global: Needleman-Wunsch • Local: Smith-Waterman • NW and SW use dynamic programming • Variations: • Gap penalty functions • Scoring matrices

  8. Global Alignment: Algorithm

  9. Theorem.C(i,j) satisfies the following relationships: Initial conditions: Recurrence relation: For 1  i n, 1  j m:

  10. S1 S2 . . . Si-1 Si S1 S2 . . . Si-1 Si T1 T2 . . . Tj-1 Tj T1 T2 . . . Tj — C(i-1,j-1) + w(Si,Tj) C(i-1,j)  S1 S2 . . . Si — T1 T2 . . . Tj-1 Tj C(i,j-1)  Justification

  11. Example Case 1: Line up Si with Tj i i - 1 S: C A T T C A C T: C - T T C A G j j -1 Case 2: Line up Si with space i - 1 i S: C A T T C A - C T: C - T T C A G - j Case 3: Line up Tj with space i S: C A T T C A C - T: C - T T C A - G j -1 j

  12. C(i-1,j-1) C(i-1,j) C(i,j-1) Computation Procedure C(0,0) C(i,j) C(n,m)

  13. -5 -10 -15 -20 -25 -30 -35 λ C T C G C A G C 0 -5 -10 -15 -20 -25 -30 -35 -40 λ 10 5 C A T T C A C +10 for match, -2 for mismatch, -5 for space

  14. * * λ C T C G C A G C λ C A T T C A C Traceback can yield both optimum alignments

  15. End-gap free alignment • Gaps at the start or end of alignment are not penalized Match: +2 Mismatch and space: -1 Best global Best end-gap free Score = 1 Score = 9

  16. Motivation: Shotgun assembly • Shotgun assembly produces large set of partially overlapping subsequences from many copies of one unknown DNA sequence. • Problem: Use the overlapping sections to ”paste” the subsequences together. • Overlapping pairs will have low global alignmentscore, but high end-space free score because of overlap.

  17. Motivation: Shotgun assembly

  18. Algorithm • Same as global alignment, except: • Initialize with zeros (free gaps at start) • Locate max in the last row/column (free gaps at end)

  19. 0 0 0 0 0 0 0 0 0 0 0 5 8 5 8 5 20 15 10 0 0 15 10 5 6 15 18 13 0 -2 10 13 8 3 10 13 16 0 10 5 20 15 18 13 8 23 5 8 15 18 13 28 23 18 0 0 0 3 10 25 20 23 38 33 λ C T C G C A G C λ 10 5 10 5 10 5 0 10 C A T T C A G +10 for match, -2 for mismatch, -5 for gap

  20. Local Alignment: Motivation • Ignoring stretches of non-coding DNA: • Non-coding regions are more likely to be subjected to mutations than coding regions. • Local alignment between two sequencesis likely to be between two exons. • Locating protein domains: • Proteins of different kind and of different species often exhibit local similarities • Local similarities may indicate ”functional subunits”.

  21. Local alignment: Example S =g g t c t g a g T =a a a c g a Match: +2 Mismatch and space: -1 Best local alignment: g g tc t g ag a a ac – g a - Score = 5

  22. Local Alignment: Algorithm C [i, j] = Score of optimally aligning a suffix of s with a suffix of t. Initialize top row and leftmost column to zero.

  23. λ C T C G C A G C λ C A T T C A C +1 for a match, -1 for a mismatch, -5 for a space

  24. Some Results • Most pairwise sequence alignment problems can be solved in O(mn) time. • Space requirement can be reduced to O(m+n), while keeping run-time fixed [Myers88]. • Highly similar sequences can be aligned in O(dn) time, where d measures the distance between the sequences [Landau86].

  25. Reducing space requirements • O(mn) tables are often the limiting factor in computing large alignments • There is a linear space technique that only doubles the time required [Hirschberg77]

  26. 0 5 8 5 8 5 20 15 10 λ C T C G C A G C 0 0 0 0 0 0 0 0 0 λ 0 10 5 10 5 10 5 0 10 C A T T C A G IDEA: We only need the previous row to calculate the next

  27. Linear-space Alignments mn + ½ mn + ¼ mn + 1/8 mn + 1/16 mn + … = 2 mn

  28. Affine Gap Penalty Functions Gap penalty = h + gk where k = length of gap h = gap opening penalty g = gap continuation penalty Can also be solved in O(nm) time using dynamic programming

More Related