1 / 76

Contents

This article provides an overview of algorithms for exact string matching and sequence alignment in the context of genome analysis. It covers topics such as dealing with long sequences, comparing and analyzing genomes, and using suffix data structures. The article also discusses the construction and applications of suffix trees in genome analysis.

retas
Download Presentation

Contents

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Contents • First week: algorithms for exact string matching: One pattern: The algorithm depends on |p| and | k patterns: The algorithm depends on k, |p| and || • Second week: Alignment of sequences. • Edit distance between two strings: dynamic programming • Alignment of sequences: • 2 sequences • 3 or more sequences • Third week: dealing with long sequences.

  2. Dealing with genomes What can be done with a genome or with a chromosome? • Compare it with other genomes. • The distribution of patterns of a given length. • The most frequent patterns of a given length. • Look for the repeats (short and long)

  3. Comparison of genomes What's the meaning?

  4. Comparison of genomes 15 microbial genomes:

  5. Comparison of genomes 2 pyrococus genomes:

  6. … a a t g….c t g... MUM … c g t g….c c c ... MUM Maximal Unique Matching … and parallel MUMs form a CLUSTER

  7. Suffix data structures 1a. Part: Suffix trees Algorithms on strings, trees and sequences, Dan Gusfield Cambridge University Press 2a. Part: Suffix arrays Suffix-arrays: a new method for on-line string searches, G. Myers, U. Manber

  8. Suffix trees 7: s s,7 s,7 6: as s,6 s,6 5: aas a a as,5 as,5 as,3 as,3 ba ba baas,1 baas,1 ba ba as,4 as,4 baas,2 baas,2 Given string ababaas: Suffixes: 3: abaas 1: ababaas 4: baas 2: babaas What kind of queries?

  9. Applications of Suffix trees s,7 s,6 as,5 a as,3 ba baas,1 ba as,4 baas,2 1. Exact string matching Does the sequence ababaas contain any ocurrence of patterns abab, aab, and ab? …………………………

  10. Quadratic insertion algorithm  and the suffix-tree Invariant Properties: Given the string …………………………...... …... P1: the leaves of suffixes from have been inserted

  11. Quadratic insertion algorithm ababaabbs,1 Given the string ababaabbs

  12. Quadratic insertion algorithm babaabbs,2 Given the string ababaabbs ababaabbs,1

  13. Quadratic insertion algorithm aba baabbs,1 Given the string ababaabbs ababaabbs,1 babaabbs,2

  14. Quadratic insertion algorithm abbs,3 aba baabbs,1 Given the string ababaabbs babaabbs,2

  15. Quadratic insertion algorithm abbs,3 aba baabbs,1 ba baabbs,2 Given the string ababaabbs babaabbs,2

  16. Quadratic insertion algorithm abbs,3 aba baabbs,1 ba abbs,4 baabbs,2 Given the string ababaabbs

  17. Quadratic insertion algorithm abbs,3 aba a baabbs,1 abbs,3 ba baabbs,1 abbs,4 abbs,4 Given the string ababaabbs ba baabbs,2

  18. Quadratic insertion algorithm abbs,5 a abbs,3 ba baabbs,1 abbs,4 abbs,4 Given the string ababaabbs ba baabbs,2

  19. Quadratic insertion algorithm abbs,5 a abbs,3 ba baabbs,1 abbs,4 abbs,4 Given the string ababaabbs ba baabbs,2

  20. Quadratic insertion algorithm abbs,5 a b abbs,3 a abbs,4 abbs,4 baabbs,1 Given the string ababaabbs ba ba baabbs,2

  21. Quadratic insertion algorithm abbs,5 a bs,6 b abbs,3 a abbs,4 abbs,4 baabbs,1 Given the string ababaabbs ba baabbs,2

  22. Quadratic insertion algorithm abbs,5 a bs,6 b abbs,3 a abbs,4 abbs,4 baabbs,1 Given the string ababaabbs ba baabbs,2

  23. Quadratic insertion algorithm abbs,5 a bs,6 b abbs,3 a bs,7 b baabbs,1 a abbs,4 baabbs,2 Given the string ababaabbs

  24. Quadratic insertion algorithm abbs,5 a bs,6 b abbs,3 a bs,7 b baabbs,1 a abbs,4 s,7 baabbs,2 Given the string ababaabbs

  25. Quadratic insertion algorithm abbs,5 a bs,6 b abbs,3 a bs,7 b baabbs,1 s,7 a abbs,4 s,7 baabbs,2 Given the string ababaabbs

  26. Generalizad suffix tree the generalized suffix tree of ababaabb and aabaat … is the suffix tree of ababaabαaabaatβ, : The suffix tree of many strings … is called the generalized suffix tree … and it is the suffix tree of the concatenation of strings. For instance,

  27. Generalizad suffix tree abbα,5 a bα,6 b abbα,3 a bα,7 b baabbα,1 α,7 a abbα,4 α,7 baabbα,2 Construction of the suffix tree of ababaabbαaabaaβ: Given the suffix tree of ababaabα:

  28. Generalizad suffix tree abbα,5 a bα,6 b abbα,3 a bα,7 b baabbα,1 α,7 a abbα,4 α,7 baabbα,2 Construction of the suffix tree of ababaabbαaabaaβ:

  29. Generalizad suffix tree aaβ,1 bα,6 b abbα,3 a bα,7 b baabbα,1 α,7 a abbα,4 α,7 baabbα,2 Construction of the suffix tree of ababaabbαaabaaβ: ab a bα,5

  30. Generalizad suffix tree aaβ,1 bα,6 b abbα,3 a bα,7 b baabbα,1 α,7 a abbα,4 α,7 baabbα,2 Construction of the suffix tree of ababaabbαaabaaβ: ab a bα,5

  31. Generalizad suffix tree β,2 bα,6 bα,7 b α,7 a abbα,4 α,7 baabbα,2 Construction of the suffix tree of ababaabbαaabaaβ: aaβ,1 ab a bα,5 b a bbα,3 a baabbα,1

  32. Generalizad suffix tree β,2 bα,6 bα,7 b α,7 a abbα,4 α,7 baabbα,2 Construction of the suffix tree of ababaabbαaabaaβ: aaβ,1 ab a bα,5 b a bbα,3 a baabbα,1

  33. Generalizad suffix tree bα,6 bα,7 α,7 β,3 α,7 Construction of the suffix tree of ababaabbαaabaaβ: aaβ,1 ab a bα,5 β,2 b a bbα,3 a b baabbα,1 a a bbα,4 baabbα,2

  34. Generalizad suffix tree bα,6 bα,7 α,7 β,3 α,7 Construction of the suffix tree of ababaabbαaabaaβ: aaβ,1 ab a bα,5 β,2 b a bbα,3 a b baabbα,1 a a bbα,4 baabbα,2

  35. Generalizad suffix tree β,4 bα,6 bα,7 α,7 α,7 Construction of the suffix tree of ababaabbαaabaaβ: aaβ,1 a b a bα,5 β,2 b a bbα,3 a b baabbα,1 β,3 a a bbα,4 baabbα,2

  36. Generalizad suffix tree β,4 bα,6 bα,7 α,7 α,7 Construction of the suffix tree of ababaabbαaabaaβ: aaβ,1 a b a bα,5 β,2 b a bbα,3 a b baabbα,1 β,3 a a bbα,4 baabbα,2

  37. Generalizad suffix tree bα,6 bα,7 α,7 α,7 Construction of the suffix tree of ababaabbαaabaaβ: β,4 β,4 aaβ,1 a b a bα,5 β,2 b a bbα,3 a b baabbα,1 β,3 a a bbα,4 baabbα,2

  38. Generalizad suffix tree bα,6 bα,7 α,7 α,7 Construction of the suffix tree of ababaabbαaabaaβ: β,4 β,4 aaβ,1 a b a bα,5 β,2 b a bbα,3 a b baabbα,1 β,3 a a bbα,4 baabbα,2

  39. Generalizad suffix tree bα,6 bα,7 α,7 α,7 Construction of the suffix tree of ababaabbαaabaaβ: β,4 β,4 β,4 aaβ,1 a b a bα,5 β,2 b a bbα,3 a b baabbα,1 β,3 a a bbα,4 baabbα,2

  40. Generalizad suffix tree β,4 β,4 β,4 aaβ,1 a b a bα,5 β,2 bα,6 b a bbα,3 a bα,7 b baabbα,1 α,7 β,3 a a bbα,4 α,7 baabbα,2 Generalized suffix tree of ababaabbαaabaaβ:

  41. Applications of Generalized Suffix trees β,4 β,4 β,4 aaβ,1 a b a bα,5 β,2 bα,6 b a bbα,3 a bα,7 b baabbα,1 α,7 β,3 a a bbα,4 α,7 baabbα,2 1. The substring problem for a database of strings DB Does the DB contain any ocurrence of patterns abab, aab, and ab?

  42. Applications of Generalized Suffix trees β,4 β,4 β,4 aaβ,1 a b a bα,5 β,2 bα,6 b a bbα,3 a bα,7 b baabbα,1 α,7 β,3 a a bbα,4 α,7 baabbα,2 2. The longest common substring of two strings

  43. Applications of Generalized Suffix trees β,4 β,4 β,4 aaβ,1 a b a bα,5 β,2 bα,6 b a bbα,3 a bα,7 b baabbα,1 α,7 β,3 a a bbα,4 α,7 baabbα,2 3. Finding MUMs.

  44. Quadratic insertion algorithm  and the suffix-tree Invariant Properties: Given the string …………………………...... …... P1: the leaves of suffixes from have been inserted

  45. Linear insertion algorithm   and the suffix-tree  …... Invariant Properties: Given the string …………………………...... P1: the leaves of suffixes from have been inserted P2: the string  is the longest string that can be spelt through the tree.

  46. Linear insertion algorithm: example   ababb...,5 a ababb...,3 ba baababb...,1 ba ababb...,4 baababb...,2 Given the string ababaababb...

  47. Linear insertion algorithm: example   ababb...,5 a ababb...,3 ba baababb...,1 ba ababb...,4 baababb...,2 Given the string ababaababb... 6 7 8

  48. Linear insertion algorithm: example ababb...,5 a ababb...,3 ba baababb...,1 ba ababb...,4 baababb...,2  Given the string ababaababb... 6 7 8 

  49. Linear insertion algorithm: example ababb...,5 a ababb...,3 ba baababb...,1 ba ababb...,4 baababb...,2  Given the string ababaababb... 6 7 89 

  50. Linear insertion algorithm: example  Given the string ababaababb... 6 7 89  ababb...,5 a ababb...,3 ba ababb...,1 b baababb...,1 baababb...,1 ababb...,4 ba b...,6 baababb...,2

More Related