1 / 89

Tools for multiple sequence alignment

Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department of Bioinformatics Göttingen, October/November 2006. Tools for multiple sequence alignment. T Y I M R E A Q Y E T C I V M R E A Y E.

scarter
Download Presentation

Tools for multiple sequence alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics Methods CourseMultiple Sequence AlignmentBurkhard MorgensternUniversity of GöttingenInstitute of Microbiology and Genetics Department of BioinformaticsGöttingen, October/November 2006

  2. Tools for multiple sequence alignment T Y I M R E A Q Y E T C I V M R E A Y E

  3. Tools for multiple sequence alignment T Y I - M R E A Q Y E T C I V M R E A - Y E

  4. Tools for multiple sequence alignment T Y I M R E A Q Y E T C I V M R E A Y E Y I M Q E V Q Q E Y I A M R E Q Y E

  5. Tools for multiple sequence alignment T Y I - M R E A Q Y E T C I V M R E A - Y E Y - I - M Q E V Q Q E Y – I A M R E - Q Y E

  6. Tools for multiple sequence alignment T Y I - M R E A Q Y E T C I V M R E A - Y E - Y I - M Q E V Q Q E Y – I A M R E - Q Y E Astronomical Number of possible alignments!

  7. Tools for multiple sequence alignment T Y I - M R E A Q Y E T C I V - M R E A Y E - Y I - M Q E V Q Q E Y – I A M R E - Q Y E Astronomical Number of possible alignments!

  8. Tools for multiple sequence alignment T Y I - M R E A Q Y E T C I V M R E A - Y E - Y I - M Q E V Q Q E Y – I A M R E - Q Y E Which one is the best ???

  9. Tools for multiple sequence alignment Questions in development of alignment programs: (1) What is a good alignment? → objective function (`score’) (2) How to find a good alignment? → optimization algorithm First question far more important !

  10. Tools for multiple sequence alignment Before defining an objective function (scoring scheme) • What is a biologically good alignment ??

  11. Tools for multiple sequence alignment Criteria for alignment quality: • 3D-Structure: align residues at corresponding positions in 3D structure of protein! • Evolution: align residues with common ancestors!

  12. Tools for multiple sequence alignment T Y I - M R E A Q Y E T C I V - M R E A Y E - Y I - M Q E V Q Q E - Y I A M R E - Q Y E Alignment hypothesis about sequence evolution Search for most plausible hypothesis!

  13. Tools for multiple sequence alignment Compute for amino acids a and b • Probability pa,b of substitution a → b (or b → a), • Frequency qaof a Define s(a,b) = log (pa,b / qa qb)

  14. Tools for multiple sequence alignment

  15. Tools for multiple sequence alignment Traditional objective functions: Define Score of alignments as • Sum of individual similarity scores s(a,b) • Gap penalty g for each gap in alignment Needleman-Wunsch scoring system (1970) for pairwise alignment (= alignment of two sequences)

  16. T Y W I V T - - L V Example: Score = s(T,T) + s(I,L) + s (V,V) – 2 g

  17. T Y W I V T - - L V Idea: alignment with optimal (maximal) score probably biologically meaningful. Dynamic programming algorithm finds optimal alignment for two sequences efficiently (Needleman and Wunsch, 1970).

  18. Tools for multiple sequence alignment Traditional Objective functions can be generalized to multiple alignment (e.g. sum-of-pair score, tree alignment) Needleman-Wunsch algorithm can also be generalized to find optimal multiple alignment, but: Verytime and memory consuming! -> Heuristic algorithm needed, i.e. fast but sub-optimal solution

  19. Tools for multiple sequence alignment Most commonly used heuristic for multiple alignment: Progressive alignment (mid 1980s)

  20. `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WWRLNDKEGYVPRNLLGLYP AVVIQDNSDIKVVPKAKIIRD YAVESEAHPGSFQPVAALERIN WLNYNETTGERGDFPGTYVEYIGRKKISP

  21. `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WWRLNDKEGYVPRNLLGLYP AVVIQDNSDIKVVPKAKIIRD YAVESEAHPGSFQPVAALERIN WLNYNETTGERGDFPGTYVEYIGRKKISP Guide tree

  22. `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD YAVESEASFQPVAALERIN WLNYNEERGDFPGTYVEYIGRKKISP Profile alignment, “once a gap - always a gap”

  23. `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD YAVESEASVQ--PVAALERIN------ WLN-YNEERGDFPGTYVEYIGRKKISP Profile alignment, “once a gap - always a gap”

  24. `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN- WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD YAVESEASVQ--PVAALERIN------ WLN-YNEERGDFPGTYVEYIGRKKISP Profile alignment, “once a gap - always a gap”

  25. `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN-------- WW--RLNDKEGYVPRNLLGLYP-------- AVVIQDNSDIKVVP--KAKIIRD------- YAVESEA---SVQ--PVAALERIN------ WLN-YNE---ERGDFPGTYVEYIGRKKISP Profile alignment, “once a gap - always a gap”

  26. CLUSTAL W Most important software program: CLUSTAL W: J. Thompson, T. Gibson, D. Higgins (1994), CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment … Nuc. Acids. Res. 22, 4673 - 4680 (~ 20.000 citations in the literature)

  27. Tools for multiple sequence alignment Problems with traditional approach: • Results depend on gap penalty • Heuristic guide tree determines alignment; alignment used for phylogeny reconstruction • Algorithm produces global alignments.

  28. Tools for multiple sequence alignment Problems with traditional approach: But: Many sequence families share only local similarity E.g. sequences share one conserved motif

  29. Local sequence alignment EYENS ERYENS ERYAS Find common motif in sequences; ignore the rest

  30. Local sequence alignment E-YENS ERYENS ERYA-S Find common motif in sequences; ignore the rest

  31. Local sequence alignment E-YENS ERYENS ERYA-S Find common motif in sequences; ignore the rest – Local alignment

  32. Gibbs Motive Sampler Local multiple alignment without gaps: C.E. Lawrence et al. (1993) Detecting subtle sequence signals: a Gibbs Sampling Strategy for Multiple Alignment Science, 262, 208 - 214

  33. Traditional alignment approaches: Either global or local methods!

  34. New question: sequence families with multiple local similarities Neither local nor global methods appliccable

  35. New question: sequence families with multiple local similarities Alignment possible if order conserved

  36. The DIALIGN approach Morgenstern, Dress, Werner (1996), PNAS 93, 12098-12103 • Combination of global and local methods • Assemble multiple alignment from gap-free local pair-wise alignments (,,fragments“)

  37. The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

  38. The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

  39. The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

  40. The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

  41. The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

  42. The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

  43. The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

  44. The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaa--gagtatcacccctgaattgaataa

  45. The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaa--gagtatcacc----------cctgaattgaataa

  46. The DIALIGN approach atc------taatagttaaactcccccgtgc-ttag cagtgcgtgtattactaac----------gg-ttcaatcgcg caaa--gagtatcacc----------cctgaattgaataa

  47. The DIALIGN approach atc------taatagttaaactcccccgtgc-ttag cagtgcgtgtattactaac----------gg-ttcaatcgcg caaa--gagtatcacc----------cctgaattgaataa Consistency!

  48. The DIALIGN approach atc------TAATAGTTAaactccccCGTGC-TTag cagtgcGTGTATTACTAAc----------GG-TTCAATcgcg caaa--GAGTATCAcc----------CCTGaaTTGAATaa

  49. The DIALIGN approach Multiple alignment: atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

More Related