170 likes | 299 Views
CSC 448: Bioninformatics Algorithms. Ukkonen’s Algorithm for Generalized Suffix Trees. Alex Dekhtyar. Example for two DNA sequences: T and T’= reverse(complement(T)). T = AATGTT T’ = AACATT. Steps. Create SuffixTree(T$) using Ukkonen’s algorithm . Keep suffix links.
E N D
CSC 448: Bioninformatics Algorithms Ukkonen’s Algorithm for Generalized Suffix Trees Alex Dekhtyar
Example for two DNA sequences: T and T’=reverse(complement(T)) T = AATGTT T’ = AACATT
Steps • Create SuffixTree(T$) using Ukkonen’s algorithm. • Keep suffix links. 2. Add “T:” to all leaf labels (designate current labels) • Traverse SuffixTree(T$) using the prefix of T’ • The stoppage point is new active point 4. Use Ukkonen’s algorithm to insert the remainder of T’ 4.1. Label leaves “T’: [x, ∞]” 4.2. modification: traverse to existing leaves to leave a label
T = AATGTT T’ = AACATT ┴ ┴ ε ε Tree Trie
T = AATGTT T’ = AACATT ┴ ┴ ε ε A T G AA AT TT TG GT AAT ATG TGT GTT AATG ATGT TGTT AATGT ATGTT AATGTT Step 1: insert fist string Tree Trie
T = AATGTT T’ = AACATT ┴ ┴ ε ε A T G AA AT TT TG GT AAT ATG TGT GTT AATG ATGT TGTT AATGT ATGTT AATGTT Last boundary path - Last active point Step 1: insert fist string Tree Trie
T = AATGTT T’ = AACATT ┴ ┴ Last active point ε ε G T A A T G A T T G AA 4 ,∞ AT TT TG GT AAT ATG TGT GTT 4,∞ 2,∞ 3,∞ AATG ATGT TGTT 6,∞ AATGT ATGTT AATGTT Last boundary path - Last active point Step 1: insert fist string Tree Trie
T = AATGTT$ T’ = AACATT ┴ ┴ 7,∞ $ Last active point ε ε G T A A T G A $ T T G AA 4 ,∞ AT TT TG GT AAT ATG TGT GTT 4,∞ 7,∞ 2,∞ 3,∞ AATG ATGT TGTT 6,∞ AATGT ATGTT AATGTT Step 1: insert fist string Step 1.5: finish the tree Last boundary path - Last active point Tree Trie
T = AATGTT$ T’ = AACATT ┴ ┴ 7,∞ $ ε ε G T A A T G A $ T T G AA 4 ,∞ AT TT TG GT AAT ATG TGT GTT 4,∞ 7,∞ 2,∞ 3,∞ AATG ATGT TGTT 6,∞ AATGT ATGTT AATGTT Step 2: Traverse the prefix of T’ Last boundary path New active point - Last active point Tree Trie
T = AATGTT$ T’ = AACATT ┴ ┴ 7,∞ $ ε ε G T A C A T G A $ T T AC G AA 4 ,∞ AT TT TG GT AAC AAT ATG TGT GTT 4,∞ 7,∞ 2,∞ 3,∞ AATG ATGT TGTT 6,∞ AATGT ATGTT AATGTT Step 2: Traverse the prefix of T’ Step 3: Start inserting the rest of T’ - active point Tree Trie
T = AATGTT$ T’ = AACATT ┴ ┴ T:7,∞ $ ε ε G T A C A T G A $ T T AC G AA T:4,∞ AT TT TG GT AAC AAT ATG TGT GTT T:4,∞ T:7,∞ T:2,∞ T:3,∞ AATG ATGT TGTT T:6,∞ AATGT ATGTT Make leaf nodes “generalized” AATGTT Step 2: Traverse the prefix of T’ Step 3: Start inserting the rest of T’ - active point Tree Trie
T = AATGTT$ T’ = AACATT ┴ ┴ T:7,∞ T’:3,∞ $ C ε ε G T A C C A T G A $ T T T’:3,∞ C AC G AA T:4,∞ AT TT TG GT T T’:3,∞ AAC AAT ATG TGT GTT T:4,∞ T:7,∞ T:2,∞ T:3,∞ AATG ATGT TGTT T:6,∞ AATGT ATGTT AATGTT Step 2: Traverse the prefix of T’ Step 3: Start inserting the rest of T’ - active point Tree Trie
T = AATGTT$ T’ = AACATT ┴ ┴ T:7,∞ T’:3,∞ $ C ε ε G T A C C A T G CA A $ T T T’:3,∞ C AC G AA T:4,∞ ACA AT TT TG GT T T’:3,∞ AAC AAT ATG TGT GTT AACA T:4,∞ T:7,∞ T:2,∞ T:3,∞ AATG ATGT TGTT T:6,∞ AATGT ATGTT Nothing to do! AATGTT Step 2: Traverse the prefix of T’ Step 3: Start inserting the rest of T’ - end point - active point Tree Trie
T = AATGTT$ T’ = AACATT ┴ ┴ T:7,∞ T’:3,∞ $ C ε ε G T A C C A T G CA A $ T T T’:3,∞ C CAT AC G AA T:4,∞ ACA AT TT TG GT T T’:3,∞ AAC ACAT AAT ATG TGT GTT AACA T:4,∞ T:7,∞ T:2,∞ T:3,∞ AATG ATGT AACAT TGTT T:6,∞ AATGT ATGTT Nothing to do! AATGTT Step 2: Traverse the prefix of T’ Step 3: Start inserting the rest of T’ - end point - active point Tree Trie
T = AATGTT$ T’ = AACATT ┴ ┴ T:7,∞ T’:3,∞ $ C ε ε G T A C C A T G CA A $ T T T’:3,∞ C CAT AC G AA T:4,∞ ACA AT TT TG GT T G T T’:3,∞ AAC ACAT AAT ATG TGT GTT AACA ATT T:4,∞ T:7,∞ T:2,∞ T:3,∞ AATG ATGT AACAT TGTT T:6,∞ T’:6,∞ AATGT ATGTT AATGTT Step 2: Traverse the prefix of T’ Step 3: Start inserting the rest of T’ - end point - active point Tree Trie
T = AATGTT$ T’ = AACATT ┴ ┴ T:7,∞ T’:3,∞ $ C ε ε G T A C C A T G CA A $ T T T’:3,∞ C CAT AC G AA T:4,∞ ACA AT TT TG GT T G T T’:3,∞ AAC ACAT AAT ATG TGT GTT AACA ATT T:4,∞ T:7,∞ T:2,∞ T:3,∞ AATG ATGT AACAT TGTT T:6,∞ T’:6,∞ AATGT T’:6,∞ ATGTT AATGTT Step 2: Traverse the prefix of T’ Step 3: Start inserting the rest of T’ - end point - active point Crucial bit coming! Tree Trie