310 likes | 1.42k Views
Fast Trie Data Structures. Seminar On Advanced Topics In Data Structures Jacob Katz December 1, 2001 Dan E. Willard, 1981, “ New Trie Data Structures Which Support Very Fast Search Operations ”. Agenda. Problem statement Existing solutions and motivation for a new one
E N D
Fast Trie Data Structures Seminar On Advanced Topics In Data Structures Jacob Katz December 1, 2001 Dan E. Willard, 1981, “New Trie Data Structures Which Support Very Fast Search Operations”
Agenda • Problem statement • Existing solutions and motivation for a new one • P-Fast tries & their complexity • Q-Fast tries & their complexity • X-Fast tries & their complexity • Y-Fast tries & their complexity
Problem statement • Let S be a set of N records with distinct integer keys in range [0, M], with the following operations: • MEMBER(K) – does the key K belong to the set • SUCCESSOR(K) – find the least element which is greater than K • PREDECESSOR(K) – find the greatest element which is less than K • SUBSET(K1, K2) – produce a list of elements whose keys lie between K1 and K2 • The problem: efficient data structure supporting this definition
Existing solutions • AVL trees, 2-3 trees use O(N) space and O(log N) time in worst case • With no restriction on the keys better performance is impossible • Expected O(log log N) time is possible when keys are uniformly distributed • Stratified trees use O(M * log log M) space and O(log log M) time in worst case for integer keys in range [0, M] • Disadvantage: O(M * log log M) space is much larger when O(N), if M >> N
Motivation for another solution • More space-efficient data structure is wanted for restricted keys, which still maintains the time efficiency…
The way to the solution • We first define P-Fast Trie: • O( ) time; O(N * * 2 ) space • Then show Q-Fast Trie • improvement to the space requirement to O(N) • Then show X-Fast Trie • O(log log M) time; O(N*log M) space; no dynamic operations • Then show Y-Fast Trie • O(log log M) time; O(N) space; no dynamic operations
root 2 4 3 0 4 1 2 2 3 2 20 22 24 31 32 42 43 What’s Trie • Trie of size (h, b) is a tree of height h and branching factor b • All keys can be regarded as integers in range [0, bh] • Each key K can be represented as h-digit number in base b: K1K2K3…Kh • Keys are stored in the leaf level; path from the root resembles decomposition of the keys to digits
Trivial Trie • In each node store vector of branches • MEMBER(K) – O(h) • visits O(h) nodes, spends O(1) time in each • SUCCESSOR(K)/PREDECESSOR(K) – O(h*b) • visits O(h) nodes, spend O(b) time in each node • this is too much time • Observation: increasing b (the base of key representation, the branching factor) decreases h (number of digits required to represent a key, the height of the tree) and vice versa
root b-1 b-1 b-1 bh-1 Example for worst case complexity
P-Fast Trie Idea • Improve SUCCESSOR(k)/PREDECESSOR(k) time by overcoming the linear search in every intermediate node
P-Fast Trie • Each internal node v has additional fields: • LOWKEY(v)– leaf node containing the smallest key descending from v • HIGHKEY(v)– leaf node containing the largest key descending from v • INNERTREE(v)– binary tree of worst-case height O(log b) representing the set of digits directly descending from v • Each leaf node points to its immediate neighbors on the left and on the right • CLOSEMATCH(K)– query returning the node with key K if it exists in the trie; returning PREDECESSOR(K) or SUCCESSOR(K) otherwise
CLOSEMATCH(k) Algorithm Intuitively • Starting from Root, look for k=k1k2..kh • If found, return it • If not, then v is the node at depth j from which there’s no way down any more: kj Ï INNERTREE(v) • Looking for kj in INNERTREE(v), find D – existing digit in INNERTREE(v) that is either: • the least digit greater than kj • the greatest digit less than kj • If D > kj, then return LOWKEY(d’s child of v), else if D < kj, then return HIGHKEY(d’s child of v)
P-Fast Trie Complexities • CLOSEMATCH(K) time complexity is O(h + log b) • Other queries require O(1) addition to the CLOSEMATCH(K) complexity • Space complexity of such trie is O(h*b*N) • Representing the input keys in base 2 requires digits, therefore with such h and b the desired complexities are achieved
Q-Fast Trie Idea • Improve space by splitting the set of keys into subsets • How to split is the problem: • To preserve the time complexity • To decrease the space complexity
Q-Fast Trie • Let S’ denote the ordered list of keys from S: 0 = K1 < K2 < K3 < … < KL < M • Define: Si = {K Î S | Ki£ K £ Ki+1} for i < L SL = {K Î S | K ³ KL} • S’ is a c-partition of S iff each Si has cardinality in range [c, 2c-1] • Q-Fast Trie of size (h, b, c) is a two-level structure: • Upper part: p-fast trie T of size (h, b) representing set S’ which is a c-partition of S • Lower part: forest of 2-3 trees, where ith tree represents Si • The leafs of 2-3 trees are connected to form an ordered list
0 35 71 10 17 33 35 70 77 81 95 99 Example of Q-Fast Trie
CLOSEMATCH(k) Algorithm Intuitively • Look for D=PREDECESSOR(k) in the upper part • O(h + log b) • Then search the D’s 2-3 tree for k • O(log c)
Q-Fast Trie Complexities • CLOSEMATCH(K) time complexity is O(h + log b + log c) • Other queries require O(1) addition to the CLOSEMATCH(K) complexity • Space complexity is O(N+N*h*b/c) • By choosing h = , b = 2 , c = h*b, the desired complexities are achieved
P/Q-Fast Trie Insertion/Deletion • P-fast trie • Use AVL trees for INNERTREEs • O(h + log b) for insertion/deletion • Q-fast trie • O(h + log b + log c) for insertion/deletion • Maintenance of c-partition property through trees splitting/merging in O(log c) time
X-Fast Trie Idea • P/Q-Fast trie uses top-down search to get to the wanted level, making binary search in each node on the way. • Thus, P/Q-Fast Trie relies on the balance between the height of the tree and the branching factor • X-Fast trie idea: Use binary search of the wanted level • Requires to be possible to find the wanted node by knowing its level without top-down pass • For the purpose of worst case complexity the branching factor is not important any more, since it only affects the basis of the log
X-Fast Trie • Part 1: Trie of height h and branching factor 2 (representing all keys in binary) • Each node has additional field DESCENDANT(v): • If v has only right branch, it points to the largest leaf descending from v (thru the left branch) • If v has only left branch, it points to the smallest leaf descending from v (thru the right branch) • All leaves form doubly-linked list • Node v at height j may have descending leaves only in range [(i-1)*2j+1, i*2j] for some integer i; this i is called ID(v) • Node v at height j is called ancestor of key K, if K/2j=ID(v) • BOTTOM(k) is the lowest ancestor of K
X-Fast Trie • Part 2: h+1 Level Search Structures (LSS), each of which uses perfect hashing as we have seen in the first lecture: • Linear space & constant time
BOTTOM(k) Algorithm Intuitively • Make binary search among the h+1 different LSSs • Searching each LSS is O(1) • h = log M, therefore binary search of h+1 LSSs is O(log log M)
X-Fast Trie Complexities • BOTTOM(k) is O(log log M) • All queries require O(1) addition to BOTTOM(k), with assistance of the DESCENDANT field and the doubly-linked list: • BOTTOM(K) is either K itself, or its DESCENDANT is PREDECESSOR(K)/SUCCESSOR(K) • Space is O(N * log M) • No more than h * N nodes in the trie (h=log M) • log M LSSs each using O(N) space
Y-Fast Trie Idea • Apply similar partitioning technique, as done for P-Fast trie to move to Q-Fast trie: c-partitioning of all the keys to L subsets each containing [c, 2c-1] keys • Upper part: X-Fast trie representing S’ • Lower part: forest of binary trees of height log c
Y-Fast Trie Complexities • Upper part can be searched within O(log log M) time and occupies no more than O((N/c) * log M) space • Each binary tree can be searched within O(log c) and they all together occupy O(N) space • Choosing c=log M: O(N) space; O(log log M) time
X/Y-Fast Trie Insertion/Deletion • LSSs have practically uncontrolled time complexity for dynamic operations • At least at the time the article was presented • Therefore, X/Y-Fast tries inherit this limitation