1 / 96

Vorlesung Grundlagen der Bioinformatik gobics.de/lectures/ss07/grundlagen

Vorlesung Grundlagen der Bioinformatik http://gobics.de/lectures/ss07/grundlagen. Sequence alignment in molecular data analysis:. Information from a Single Sequence Alone. Sequence alignment in molecular data analysis:. Information from a Single Sequence Alone. Multi-Organism

skah
Download Presentation

Vorlesung Grundlagen der Bioinformatik gobics.de/lectures/ss07/grundlagen

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Vorlesung Grundlagen der Bioinformatik http://gobics.de/lectures/ss07/grundlagen

  2. Sequence alignment in molecular data analysis: Information from a Single Sequence Alone

  3. Sequence alignment in molecular data analysis: Information from a Single Sequence Alone Multi-Organism High Quality Sequences (M. Brudno)

  4. Tools for multiple sequence alignment seq1 T Y I M R E A Q Y E seq2 T C I V M R E A Y E seq3 Y I M Q E V Q Q E seq4 Y I A M R E Q Y E

  5. Tools for multiple sequence alignment seq1 T Y I - M R E A Q Y E seq2 T C I V M R E A - Y E seq3 Y - I - M Q E V Q Q E seq4 Y – I A M R E - Q Y E

  6. Tools for multiple sequence alignment seq1 T Y I - M R E A Q Y E seq2 T C I V M R E A - Y E seq3 Y - I - M Q E V Q Q E seq4 Y – I A M R E - Q Y E

  7. Tools for multiple sequence alignment seq1 T Y I - M R E A Q Y E seq2 T C I V M R E A - Y E seq3 Y - I - M Q E V Q Q E seq4 Y – I A M R E - Q Y E

  8. Tools for multiple sequence alignment seq1 T Y I - M R E A Q Y E seq2 T C I V M R E A - Y E seq3 Y - I - M Q E V Q Q E seq4 Y – I A M R E - Q Y E

  9. Tools for multiple sequence alignment seq1 T Y I - M R E A Q Y E seq2 T C I V M R E A - Y E seq3 Y - I - M Q E V Q Q E seq4 Y – I A M R E - Q Y E • Functionally important regions more conserved than non-functional regions

  10. Tools for multiple sequence alignment seq1 T Y I - M R E A Q Y E seq2 T C I V M R E A - Y E seq3 Y - I - M Q E V Q Q E seq4 Y – I A M R E - Q Y E • Functionally important regions more conserved than non-functional regions • Local sequence conservation indicates functionality!

  11. Tools for multiple sequence alignment seq1 T Y I - M R E A Q Y E seq2 T C I V M R E A - Y E seq3 - Y I - M Q E V Q Q E seq4 Y – I A M R E - Q Y E Astronomical Number of possible alignments!

  12. Tools for multiple sequence alignment seq1 T Y I - M R E A Q Y E seq2 T C I V - M R E A Y E seq3 - Y I - M Q E V Q Q E seq4 Y – I A M R E - Q Y E Astronomical Number of possible alignments!

  13. Tools for multiple sequence alignment seq1 T Y I - M R E A Q Y E seq2 T C I V M R E A - Y E seq3 - Y I - M Q E V Q Q E seq4 Y – I A M R E - Q Y E Which one is the best ???

  14. Tools for multiple sequence alignment Questions in development of alignment programs: (1) What is a good alignment? → objective function (`score’) (2) How to find a good alignment? → optimization algorithm First question far more important !

  15. Tools for multiple sequence alignment Most important scoring scheme for multiple alignment: Sum-of-pairs score for global alignment.

  16. Divide-and-Conquer Alignment (DCA) J. Stoye, A. Dress (Bielefeld) Approximate optimal global multiple alignment • Divide sequences into small sub-sequences • Use MSA to calculate optimal alignment for sub-sequences • Concatenate sub-alignments

  17. Divide-and-Conquer Alignment (DCA)

  18. Divide-and-Conquer Alignment (DCA)

  19. Tools for multiple sequence alignment Problems with traditional approach: • Results depend on gap penalty • Heuristic guide tree determines alignment; alignment used for phylogeny reconstruction • Algorithm produces global alignments.

  20. First step in sequence comparison: alignment • global alignment (Needleman and Wunsch, 1970; Clustal W) • atctaatagttaatactcgtccaagtat • atctgtattactaaacaactggtgctacta

  21. First step in sequence comparison: alignment • global alignment (Needleman and Wunsch, 1970; Clustal W) • atc--taatagttaat--actcgtccaagtat • ||| || || | || ||| || | | || • atctgtattact-aaacaactggtgctacta-

  22. First step in sequence comparison: alignment • global alignment (Needleman and Wunsch, 1970; Clustal W) • atc--taatagttaat--actcgtccaagtat • ||| || || | || ||| || | | || • atctgtattact-aaacaactggtgctacta- • local alignment (Smith and Waterman, 1983) • atctaatagttaatactcgtccaagtat • gcgtgtattactaaacggttcaatctaacat

  23. First step in sequence comparison: alignment • global alignment (Needleman and Wunsch, 1970; Clustal W) • atc--taatagttaat--actcgtccaagtat • ||| || || | || ||| || | | || • atctgtattact-aaacaactggtgctacta- • local alignment (Smith and Waterman, 1983) • atctaatagttaatactcgtccaagtat • gcgtgtattactaaacggttcaatctaacat

  24. First step in sequence comparison: alignment • global alignment (Needleman and Wunsch, 1970; Clustal W) • atc--taatagttaat--actcgtccaagtat • ||| || || | || ||| || | | || • atctgtattact-aaacaactggtgctacta- • local alignment (Smith and Waterman, 1983) • atc--taatagttaatactcgtccaagtat • || || | || • gcgtgtattact-aaacggttcaatctaacat

  25. New question: sequence families with multiple local similarities Neither local nor global methods appliccable

  26. New question: sequence families with multiple local similarities Alignment possible if order conserved

  27. The DIALIGN approach Morgenstern, Dress, Werner (1996), PNAS 93, 12098-12103 • Combination of global and local methods • Assemble multiple alignment from gap-free local pair-wise alignments (,,fragments“)

  28. The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

  29. The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

  30. The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

  31. The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

  32. The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

  33. The DIALIGN approach atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

  34. The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

  35. The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaa--gagtatcacccctgaattgaataa

  36. The DIALIGN approach atc------taatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaa--gagtatcacc----------cctgaattgaataa

  37. The DIALIGN approach atc------taatagttaaactcccccgtgc-ttag cagtgcgtgtattactaac----------gg-ttcaatcgcg caaa--gagtatcacc----------cctgaattgaataa

  38. The DIALIGN approach atc------taatagttaaactcccccgtgc-ttag cagtgcgtgtattactaac----------gg-ttcaatcgcg caaa--gagtatcacc----------cctgaattgaataa Consistency!

  39. The DIALIGN approach atc------TAATAGTTAaactccccCGTGC-TTag cagtgcGTGTATTACTAAc----------GG-TTCAATcgcg caaa--GAGTATCAcc----------CCTGaaTTGAATaa

  40. The DIALIGN approach Score of an alignment: • Define score of fragment f: l(f) = length of f s(f) = sum of matches (similarity values) P(f) = probability to find a fragment with length l(f) and at least s(f) matches in random sequences that have the same length as the input sequences. Score w(f) = -ln P(f)

  41. The DIALIGN approach Score of an alignment: • Define score of fragment f: • Define score of alignment as sum of scores of involved fragments No gap penalty!

  42. The DIALIGN approach Score of an alignment: Goal in fragment-based alignment approach: find Consistent collection of fragments with maximum sum of weight scores

  43. The DIALIGN approach atctaatagttaaaccccctcgtgcttagagatccaaac cagtgcgtgtattactaacggttcaatcgcgcacatccgc Pair-wise alignment:

  44. The DIALIGN approach atctaatagttaaaccccctcgtgcttagagatccaaac cagtgcgtgtattactaacggttcaatcgcgcacatccgc Pair-wise alignment: • recursive algorithm finds optimal chain of fragments.

  45. The DIALIGN approach ------atctaatagttaaaccccctcgtgcttag-------agatccaaac cagtgcgtgtattactaac----------ggttcaatcgcgcacatccgc-- Pair-wise alignment: • recursive algorithm finds optimal chain of fragments.

  46. The DIALIGN approach ------atctaatagttaaaccccctcgtgcttag-------agatccaaac cagtgcgtgtattactaac----------ggttcaatcgcgcacatccgc-- Optimal pairwise alignment: chain of fragments with maximum sum of weights found by dynamic programming: • Standard fragment-chaining algorithm • Space-efficient algorithm

  47. The DIALIGN approach Multiple alignment: atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa

  48. The DIALIGN approach Multiple alignment: atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaccctgaattgaagagtatcacataa (1) Calculate all optimal pair-wise alignments

  49. The DIALIGN approach Multiple alignment: atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa (1) Calculate all optimal pair-wise alignments

  50. The DIALIGN approach Multiple alignment: atctaatagttaaactcccccgtgcttag cagtgcgtgtattactaacggttcaatcgcg caaagagtatcacccctgaattgaataa (1) Calculate all optimal pair-wise alignments

More Related