1 / 69

Using Dynamic Programming To Align Sequences

Using Dynamic Programming To Align Sequences. Cédric Notredame. Our Scope. Understanding the DP concept. Coding a Global and a Local Algorithm. Aligning with Affine gap penalties. Saving memory. Sophisticated variants…. Outline. -Coding Dynamic Programming with Non-affine Penalties.

roxy
Download Presentation

Using Dynamic Programming To Align Sequences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Dynamic Programming To Align Sequences Cédric Notredame

  2. Our Scope Understanding the DP concept Coding a Global and a Local Algorithm Aligning with Affine gap penalties Saving memory Sophisticated variants…

  3. Outline -Coding Dynamic Programming with Non-affine Penalties -Turning a global algorithm into a local Algorithm -Adding affine penalties -Using A Divide and conquer Strategy -Tailoring DP to your needs: -The repeated Matches Algorithm -Double Dynamic Programming

  4. Global Alignments Without Affine Gap penalties Dynamic Programming

  5. How To align Two Sequences With a Gap Penalty, A Substitution matrix and Not too Much Time Dynamic Programming

  6. A bit of History… -DP invented in the 50s by Bellman -Programming  Tabulation -Re-invented in 1970 by Needlman and Wunsch -It took 10 year to find out…

  7. The Foolish Assumption The score of each column of the alignment is independent from the rest of the alignment It is possible to model the relationship between two sequences with: -A substitution matrix -A simple gap penalty

  8. The Principal of DP X - Deletion ? ? X-XX XXXX X X + Alignment - X Insertion If you extend optimally an optimal alignment of two sub-sequences, the result remains an optimal alignment

  9. Finding the score of i,j -Sequence 1: [1-i] -Sequence 2: [1-j] -The optimal alignment of [1-i] vs [1-j] can finish in three different manners: - X X X X -

  10. Finding the score of i,j 1…i 1…j-1 - j + Three ways to buildthe alignment 1…i 1…j 1…i-1 1…j-1 i j + 1…i-1 1…j i - +

  11. Finding the score of i,j 1…i 1…j In order to Compute the score of All we need are the scores of: 1…i 1…j-1 1…i-1 1…j 1…i-1 1…j-1

  12. Formalizing the algorithm 1…i 1…j-1 - X + F(i,j-1) + Gep 1…i-1 1…j-1 X X F(i,j)= best F(i-1,j-1) + Mat[i,j] + F(i-1,j) + Gep 1…i-1 1…j X - +

  13. - F A T - F A S T Arranging Everything in a Table 1…I-1 1…J-1 1…I 1…J-1 1…I-1 1…J 1…I 1…J

  14. Taking Care of the Limits The DP strategy relies on the idea that ALL the cells in your table have the same environment… This is NOT true of ALL the cells!!!! In a Dynamic Programming strategy, the most delicate part is to take care of the limits: -what happens when you start -what happens when you finish

  15. F - FA-- FAT--- -1 -2 -3 -1 F - -2 FA-- -3 FAS--- Taking Care of the Limits - F A T - 0 F A S Match=2 MisMatch=-1 Gap=-1 T -4

  16. Filing Up The Matrix

  17. - F A T - -1 -2 -3 -2 0 -3 -4 +2 +2 +2 +3 0 -1 +1 -1 F -1 +2 0 -2 -3 +5 -4 -2 -1 0 -3 +4 +3 +2 +3 +3 -1 +1 0 +3 0 +1 +2 +4 +5 -2 +1 +2 +1 -3 0 0 -1 -4 -2 -5 +3 A -2 S -3 T -4 0

  18. Delivering the alignment: Trace-back A A F F S - T T Score of 1…3 Vs 1…4  Optimal Aln Score

  19. Trace-back: possible implementation while (!($i==0 && $j==0)) { if ($tb[$i][$j]==$sub) #SUBSTITUTION { $alnI[$aln_len]=$seqI[--$i]; $alnJ[$aln_len]=$seqJ[--$j]; } elsif ($tb[$i][$j]==$del) #DELETION { $alnI[$aln_len]='-'; $alnJ[$aln_len]=$seqJ[--$j]; } elsif ($tb[$i][$j]==$ins) #INSERTION { $alnI[$aln_len]=$seqI[0][--$i]; $alnJ[$aln_len]='-'; } $aln_len++; }

  20. Local Alignments Without Affine Gap penalties Smith and Waterman

  21. Getting rid of the pieces of Junk between the interesting bits Smith and Waterman

  22. The Smith and Waterman Algorithm 1…i 1…j-1 - X + F(i-1,j) + Gep 1…i-1 1…j-1 X X F(i-1,j-1) + Mat[i,j] + F(i,j-1) + Gep 1…i-1 1…j X - + 0 F(i,j)= best

  23. The Smith and Waterman Algorithm 0  Ignore The rest of the Matrix  Terminate a local Aln

  24. Filing Up a SW Matrix 0

  25. Filling up a SW matrix: borders Easy:Local alignments NEVER start/end with a gap… * - A N I C E C A T- 0 0 0 0 0 0 0 0 0 C 0A 0 T 0A 0 N 0 D 0 O 0 G 0

  26. Filling up a SW matrix * - A N I C E C A T- 0 0 0 0 0 0 0 0 0 C 00 0 0 2 0 2 0 0 A 02 0 0 0 0 0 4 0T 00 0 0 0 0 0 2 6A 0 2 0 0 0 0 0 0 4N 0 0 4 2 0 0 0 0 2D 0 0 2 2 0 0 0 0 0O 0 0 0 0 0 0 0 0 0G 0 0 0 0 0 0 0 0 0 Best Local score  Beginning of the trace-back

  27. for ($i=1; $i<=$len0; $i++) { for ($j=1; $j<=$len1; $j++) { if ($res0[0][$i-1] eq $res1[0][$j-1]){$s=2;} else {$s=-1;} $sub=$mat[$i-1][$j-1]+$s; $del=$mat[$i ][$j-1]+$gep; $ins=$mat[$i-1][$j ]+$gep; if ($sub>$del && $sub>$ins && $sub>0) {$smat[$i][$j]=$sub;$tb[$i][$j]=$subcode;} elsif($del>$ins && $del>0 ) {$smat[$i][$j]=$del;$tb[$i][$j]=$delcode;} elsif( $ins>0 ) {$smat[$i][$j]=$ins;$tb[$i][$j]=$inscode;} else {$smat[$i][$j]=$zero;$tb[$i][$j]=$stopcode;} if ($smat[$i][$j]> $best_score) { $best_score=$smat[$i][$j]; $best_i=$i; $best_j=$j; } } } TurningNW into SW PrepareTrace back

  28. Chance should not pay when it comes to local alignments ! A few things to remember SW only works if the substitution matrix has been normalized to give a Negative score to a random alignment.

  29. More than One match… -SW delivers only the best scoring Match • If you need more than one match: • SIM (Huang and Millers) • Or • Waterman and Eggert (Durbin, p91)

  30. Waterman and Eggert • Iterative algorithm: • 1-identify the best match • 2-redo SW with used pairs forbidden • 3-finish when the last interesting local extracted • Delivers a collection of non-overlapping local alignments • Avoid trivial variations of the optimal.

  31. Adding Affine Gap Penalties The Gotoh Algorithm

  32. Forcing a bit of Biology into your alignment The Gotoh Formulation

  33. Why Affine gap Penalties are Biologically better GOP Cost Cost=gop+L*gep GOP GOP Or Cost=gop+(L-1)*gep GOP GEP Parsimony: Evolution takes the simplest path (So We Think…) L Afine Gap Penalty

  34. But Harder To compute… Opening Extension ? ? + Opening Extension More Than 3 Ways to extend an Alignment X - Deletion X-XX XXXX X X Alignment - X Insertion

  35. More Questions Need to be asked For instance, what is the cost of an insertion ? 1…I-1 ??X 1…J-1 ??X 1…I ??- 1…J-1 ??X GEP GOP 1…I ??- 1…J ??X

  36. Solution:Maintain 3 Tables Ix: Table that contains the score of every optimal alignment 1…i vs 1…j that finishes with an Insertion in sequence X. Iy: Table that contains the score of every optimal alignment 1…I vs 1…J that finishes with an Insertion in sequence Y. M: Table that contains the score of every optimal alignment 1…I vs 1…J that finishes with an alignment between sequence X and Y

  37. M(i-1,j-1) + Mat(i,j) 1…i-1 1…j-1 X X + M(i,j)= best Ix(i-1,j-1) + Mat(i,j) Iy(i-1,j-1) + Mat(i,j) + 1…i-1 X 1…j X X - M(i-1,j) + gop Ix(i,j)= best + 1…i-1 X 1…j - X - Ix(i-1,j) + gep + 1…i X 1…j-1 X - X M(i,j-1) + gop Iy(i,j)= best + 1…i - 1…j-1 X - X Iy(i,j-1) + gep The Algorithm

  38. Trace-back? Ix Iy M M(i,j) Start From BEST Ix(i,j) Iy(i,j)

  39. Trace-back? Ix Iy M Navigate from one table to the next, knowing that a gap always finishes with an aligned column…

  40. Going Further ? With the affine gap penalties, we have increased the number of possibilities when building our alignment. CS talk of states and represent this as a Finite State Automaton (FSA are HMM cousins)

  41. Going Further ?

  42. Going Further ? In Theory, there is no Limit on the number of states one may consider when doing such a computation.

  43. Going Further ? Imagine a pairwise alignment algorithm where the gap penalty depends on the length of the gap. Can you simplify it realistically so that it can be efficiently implemented?

  44. Lx Ly

  45. A divide and Conquer Strategy The Myers and Miller Strategy

  46. Remember Not To Run Out of Memory The Myers and Miller Strategy

  47. A Score in Linear Space You never Need More Than The Previous Row To Compute the optimal score

  48. A Score in Linear Space For I For J R2[i][j]=best For J, R1[j]=R2[j] R1 R2[j-1], +gep R1[j-1]+mat R1[j]+gep R2

More Related