1 / 16

CSC 448: Bioninformatics Algorithms

CSC 448: Bioninformatics Algorithms. Ukkonen’s Algorithm for Generalized Suffix Trees. Alex Dekhtyar. Example for two DNA sequences: T and T’= reverse(complement(T)). T = AATGTT T’ = AACATT. Steps. Create SuffixTree(T$) using Ukkonen’s algorithm . Keep suffix links.

kathie
Download Presentation

CSC 448: Bioninformatics Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSC 448: Bioninformatics Algorithms Ukkonen’s Algorithm for Generalized Suffix Trees Alex Dekhtyar

  2. Example for two DNA sequences: T and T’=reverse(complement(T)) T = AATGTT T’ = AACATT

  3. Steps • Create SuffixTree(T$) using Ukkonen’s algorithm. • Keep suffix links. 2. Add “T:” to all leaf labels (designate current labels) • Traverse SuffixTree(T$) using the prefix of T’ • The stoppage point is new active point 4. Use Ukkonen’s algorithm to insert the remainder of T’ 4.1. Label leaves “T’: [x, ∞]” 4.2. modification: traverse to existing leaves to leave a label

  4. T = AATGTT T’ = AACATT ┴ ┴ ε ε Tree Trie

  5. T = AATGTT T’ = AACATT ┴ ┴ ε ε A T G AA AT TT TG GT AAT ATG TGT GTT AATG ATGT TGTT AATGT ATGTT AATGTT Step 1: insert fist string Tree Trie

  6. T = AATGTT T’ = AACATT ┴ ┴ ε ε A T G AA AT TT TG GT AAT ATG TGT GTT AATG ATGT TGTT AATGT ATGTT AATGTT Last boundary path - Last active point Step 1: insert fist string Tree Trie

  7. T = AATGTT T’ = AACATT ┴ ┴ Last active point ε ε G T A A T G A T T G AA 4 ,∞ AT TT TG GT AAT ATG TGT GTT 4,∞ 2,∞ 3,∞ AATG ATGT TGTT 6,∞ AATGT ATGTT AATGTT Last boundary path - Last active point Step 1: insert fist string Tree Trie

  8. T = AATGTT$ T’ = AACATT ┴ ┴ 7,∞ $ Last active point ε ε G T A A T G A $ T T G AA 4 ,∞ AT TT TG GT AAT ATG TGT GTT 4,∞ 7,∞ 2,∞ 3,∞ AATG ATGT TGTT 6,∞ AATGT ATGTT AATGTT Step 1: insert fist string Step 1.5: finish the tree Last boundary path - Last active point Tree Trie

  9. T = AATGTT$ T’ = AACATT ┴ ┴ 7,∞ $ ε ε G T A A T G A $ T T G AA 4 ,∞ AT TT TG GT AAT ATG TGT GTT 4,∞ 7,∞ 2,∞ 3,∞ AATG ATGT TGTT 6,∞ AATGT ATGTT AATGTT Step 2: Traverse the prefix of T’ Last boundary path New active point - Last active point Tree Trie

  10. T = AATGTT$ T’ = AACATT ┴ ┴ 7,∞ $ ε ε G T A C A T G A $ T T AC G AA 4 ,∞ AT TT TG GT AAC AAT ATG TGT GTT 4,∞ 7,∞ 2,∞ 3,∞ AATG ATGT TGTT 6,∞ AATGT ATGTT AATGTT Step 2: Traverse the prefix of T’ Step 3: Start inserting the rest of T’ - active point Tree Trie

  11. T = AATGTT$ T’ = AACATT ┴ ┴ T:7,∞ $ ε ε G T A C A T G A $ T T AC G AA T:4,∞ AT TT TG GT AAC AAT ATG TGT GTT T:4,∞ T:7,∞ T:2,∞ T:3,∞ AATG ATGT TGTT T:6,∞ AATGT ATGTT Make leaf nodes “generalized” AATGTT Step 2: Traverse the prefix of T’ Step 3: Start inserting the rest of T’ - active point Tree Trie

  12. T = AATGTT$ T’ = AACATT ┴ ┴ T:7,∞ T’:3,∞ $ C ε ε G T A C C A T G A $ T T T’:3,∞ C AC G AA T:4,∞ AT TT TG GT T T’:3,∞ AAC AAT ATG TGT GTT T:4,∞ T:7,∞ T:2,∞ T:3,∞ AATG ATGT TGTT T:6,∞ AATGT ATGTT AATGTT Step 2: Traverse the prefix of T’ Step 3: Start inserting the rest of T’ - active point Tree Trie

  13. T = AATGTT$ T’ = AACATT ┴ ┴ T:7,∞ T’:3,∞ $ C ε ε G T A C C A T G CA A $ T T T’:3,∞ C AC G AA T:4,∞ ACA AT TT TG GT T T’:3,∞ AAC AAT ATG TGT GTT AACA T:4,∞ T:7,∞ T:2,∞ T:3,∞ AATG ATGT TGTT T:6,∞ AATGT ATGTT Nothing to do! AATGTT Step 2: Traverse the prefix of T’ Step 3: Start inserting the rest of T’ - end point - active point Tree Trie

  14. T = AATGTT$ T’ = AACATT ┴ ┴ T:7,∞ T’:3,∞ $ C ε ε G T A C C A T G CA A $ T T T’:3,∞ C CAT AC G AA T:4,∞ ACA AT TT TG GT T T’:3,∞ AAC ACAT AAT ATG TGT GTT AACA T:4,∞ T:7,∞ T:2,∞ T:3,∞ AATG ATGT AACAT TGTT T:6,∞ AATGT ATGTT Nothing to do! AATGTT Step 2: Traverse the prefix of T’ Step 3: Start inserting the rest of T’ - end point - active point Tree Trie

  15. T = AATGTT$ T’ = AACATT ┴ ┴ T:7,∞ T’:3,∞ $ C ε ε G T A C C A T G CA A $ T T T’:3,∞ C CAT AC G AA T:4,∞ ACA AT TT TG GT T G T T’:3,∞ AAC ACAT AAT ATG TGT GTT AACA ATT T:4,∞ T:7,∞ T:2,∞ T:3,∞ AATG ATGT AACAT TGTT T:6,∞ T’:6,∞ AATGT ATGTT AATGTT Step 2: Traverse the prefix of T’ Step 3: Start inserting the rest of T’ - end point - active point Tree Trie

  16. T = AATGTT$ T’ = AACATT ┴ ┴ T:7,∞ T’:3,∞ $ C ε ε G T A C C A T G CA A $ T T T’:3,∞ C CAT AC G AA T:4,∞ ACA AT TT TG GT T G T T’:3,∞ AAC ACAT AAT ATG TGT GTT AACA ATT T:4,∞ T:7,∞ T:2,∞ T:3,∞ AATG ATGT AACAT TGTT T:6,∞ T’:6,∞ AATGT T’:6,∞ ATGTT AATGTT Step 2: Traverse the prefix of T’ Step 3: Start inserting the rest of T’ - end point - active point Crucial bit coming! Tree Trie

More Related