470 likes | 1.31k Views
Adaptive Radix Tree. - Sujay Gandham. Problem. Today’s main memory capacities are larger enough to fit the whole database into RAM. Index structure performance is a critical bottleneck. Traditional data structures are outdated and do not utilize CPU caches effectively.
E N D
Adaptive Radix Tree - Sujay Gandham
Problem • Today’s main memory capacities are larger enough to fit the whole database into RAM. • Index structure performance is a critical bottleneck. • Traditional data structures are outdated and do not utilize CPU caches effectively.
Shortcomings of current index structures • T trees [2] are more than 25 years old • They do not accommodate for modern changes in processor architecture • Divergent main memory speeds and growing CPU cache sizes -> failure in assumption of uniform memory access time.
Shortcomings of current index structures • B+ trees[3] though cache friendly have expensive update operations. • FAST[4] and k-ary[5] search trees fail to support incremental update operations. • Hash tables though have faster memory access can support only point queries and cannot handle growth (reorganization req.)
Radix Trees • Height depends on key length and not on the number of elements in • the tree. • No rebalancing required • Keys stored in lexicographic order • Path to leaf node represents key of the leaf (implicitly)
Radix Trees • Span : The number of bits or characters within the key used to determine the next child. • If a key has 32 bits. • Span = 1 => Tree has 32 levels • Span = 4 => Tree has 8 levels • Span = 8 => Tree has 4 levels.
Radix Tree vs BST • Time complexity for Radix Tree = O(k) • Time complexity for a Perfect BST = O(klogn) • For larger span Radix Trees perform better compared to traditional data structures.
Radix Trees • Greater span => smaller height • Greater span also => more child pointers to be null => greater space consumption.
Adaptive Radix Trees • To reduce the space consumption and tree height within Radix trees. • Results in faster lookup, efficient insertions, deletions and updates. • Supports range scans and prefix lookups as the data is sorted.
Adaptive Nodes • Use of different sized nodes based on the number of non-null children. • Enables us to use larger span without much space consumption. • Have fixed number of variable sized nodes (four) to avoid expensive resizing of nodes after every update.
Radix Tree and ART Radix Tree ART
Node 4 • Keys of size 1 byte each are used • Array of length 4 for keys • Array of length 4 for child pointers.
Node 16 Same as node 4 but can store upto 16 keys and 16 child pointers.
Node 48 Increase in number of entries => Searching becomes expensive 256-element array is used that can be indexed with key bytes directly This indirection saves space as indexes require less memory
Node 256 Array of 256 pointers In front of every node we have a header that stores -> node type, number of children and compressed path
Path compression • Two types: Pessimistic and Optimistic • Pessimistic: Every inner node has a variable partial key vector that is stored. • It indicates keys of preceding one-way nodes that are removed • Used while lookup to proceed to the next child.
Path Compression • Optimistic: Only the count of preceding one-way nodes are stored. • Lookup skips this number of bytes in the key without comparing. • Eventually the search key and the leaf node key are to be compared to avoid wrong turns.
Hybrid of Pessimistic and Optimistic • Pessimistic uses more space for partial key storage while optimistic requires an extra check at the end near the leaves. • Hybrid approach used i.e. we allocate 8 bytes and use pessimistic approach for partial key storage in every node and if the size exceeds then dynamically we shift to optimistic.
Search • Node 4 is looked up by looping across the different keys (2-4) to find a match • Node 16: Since the keys are sorted , binary search can be used to lookup the key. • Node 48 is looked up by first accessing the child index and returning the pointer • Node 256 look up involves a single array access
Insertion and Deletion • Tree is traversed as usual until the position for the new leaf is found. • In case of lazy expansion, an existing leaf is encountered. In such a case an inner node is created and the existing and new node are stored under it. • To accommodate path compression, in case the key of the new leaf differs from that of the compressed path then a new inner node is added above the current node. • Deletion is performed by removing the leaf node and shrinking the tree.
Conclusion • ART is a fast and space efficient indexing structure for main memory database systems • The use of path compression and lazy approach to reduce the tree height leads to excellent performance. • The space consumption barrier within a radix tree is overcome by dynamically choosing the internal nodes of different sizes. • Though hash tables have a faster lookup they have random access thereby making them unfit.
References • “The adaptive radix tree: ARTful indexing for main-memory databases” [Leis, V. ; Kemper, A. ; Neumann, T.] – ICDB 2013 • T. J. Lehman and M. J. Carey, “A study of index structures for main memory database management systems,” in VLDB, 1986. • R. Bayer and E. McCreight, “Organization and maintenance of large ordered indices,” in SIGFIDET, 1970. • C. Kim, J. Chhugani, N. Satish, E. Sedlar, A. D. Nguyen, T. Kaldewey,V. W. Lee, S. A. Brandt, and P. Dubey, “FAST: fast architecture sensitive tree search on modern cpus and gpus,” in SIGMOD, 2010. • B. Schlegel, R. Gemulla, and W. Lehner, “k-ary search on modern processors,” in DaMoN workshop, 2009.