130 likes | 410 Views
CS 225 Data Structures and Software Principles. Section 12 Tries. Agenda. Tries Basics Jason’s Code Patricia Trees De La Briandais Trees Hybrid Trees. Tries. Data structure optimized for lookups on a key that can be decomposed into characters Take into account overlapping strings
E N D
CS 225 Data Structures and Software Principles Section 12 Tries
Agenda • Tries • Basics • Jason’s Code • Patricia Trees • De La Briandais Trees • Hybrid Trees
Tries • Data structure optimized for lookups on a key that can be decomposed into characters • Take into account overlapping strings • Regular expression search • Pattern searches in images • Non-compact tries • De La Briandais Trees, Patricia Trees
Trie Representation • Tries are represented using a tree of arrays • For a character set of size k, the corresponding Trie structure is a (k+1)-ary tree
Tries • The i-th character (starting at 0) in the data corresponds to the node at depth i • Need a mapping of character to an index in the array • The extra one cell in the array is used to hold a “null character” represented by • Points to a leaf • Ideally, no need to store key in a leaf, since it is completely determined by path followed • Info stored at the leaf • Spend only constant time at each level
a b c … r s … z 0 a t 1 1 f a i 2 2 t r r 3 3 3 t 4 4 4 star stir raft 5 start Trie Example Words in Trie raft star start stir
Tries • Running time of Find operation: O(L) where L is the length of the string we are looking for • Unique trie for any set of search keys • Advantage: NOT dependent on the number of strings we have in the Trie structure • Disadvantage: memory waste • 27 cell array, one per character needed for Strings • Space: (k+1) * #nodes * sizeof(pointer)
Jason’s Code:TrieNode Data TrieNode { int nodeLevel; // level of the node bool isLeaf; Array<TrieNode*> subtries; // array of ptrs to nodes String key; // string key in leaf nodes Etype storedInfo; }
Patricia Trees • Acronym: Practical Algorithm To Retrieve Information Coded In Alphanumeric • Trick: only allocate arrays that make a “decision” • Do not store nodes with only one non-NULL cell • Store in each node the index of the character position on which it discriminates • Tradeoff: Less space required, but more work for Insert and Remove • Key no longer uniquely determined by path • Now we must store keys in the leaf
One (Patricia) Tree Application • Communication Networks (CS 438) • Task of efficiently finding the longest match between an IP address and variable-length prefixes in a forwarding table (due to CIDR) • Given packet for 128.174.5.130, where would it go? 1000 0000 1011 1100
De La Briandais Trees • Trick: convert arrays in Trie to sparse arrays • Allocate space only for used cells in the arrays • At each level we now have a linked list • Array cells are now nodes that not only point down, but to the next used character on that level • Advantage: can save much space; good when the linked lists are not long • Disadvantage: search is now dependent on k (alphabet size)
head r s a t f a i r r t t star raft stir start de la Briandais Tree Example Words in Trie raft star start stir
Hybrid Structures • Patricia/de la Briandais • Uses both optimizations • We eliminate all one node linked lists in the de la Briandais tree • Trie/Patricia/de la Briandais • Highly optimized data structure • Some levels have arrays and others have linked lists