E N D
Trie • A trie (from retrieval), is a multi-way tree structure useful for storing strings over an alphabet. It has been used to store large dictionaries of English (say) words in spelling-checking programs and in natural-language "understanding" programs. Given the data: • an, ant, all, allot, alloy, aloe, are, ate, be
Tire (Cont.) • The idea is that all strings sharing a common stem or prefix hang off a common node. When the strings are words over {a..z}, a node has at most 27 children - one for each letter plus a terminator. • The elements in a string can be recovered in a scan from the root to the leaf that ends a string. All strings in the trie can be recovered by a depth-first scan of the tree.
Suffix Trie • The idea behind suffix trie is to assign to each symbol in a text an index corresponding to its position in the text (i.e., first symbol has index 1, last symbol has index n = # of symbols in the text).
Suffix Trie (Cont.) • A suffix trie is an ordinary trie in which the input strings are all possible suffixes. • A suffix of a text [t1 ... tn] is a substring [ti ... tn] where i is an integer between 1 and n.
Suffix Trie (Cont.) • To demonstrate the structure of the resulting tree we will build the suffix trie corresponding to the following text: TEXT: G O O G O L $POSITION: 1 2 3 4 5 6 7
Suffix Tree • The suffix tree is created by compacting every unary node in the suffix trie.