90 likes | 235 Views
Topics 10: Cache Conscious Indexes. As main memory gets cheaper, it becomes affordable to build computers with large memories. In future databases all data but few large tables will be memory-resident. Therefore is it important to build efficient main-memory indexes.
E N D
Topics 10: Cache Conscious Indexes • As main memory gets cheaper, it becomes affordable to build computers with large memories. • In future databases all data but few large tables will be memory-resident. • Therefore is it important to build efficient main-memory indexes. • These indexes should consider the hierarchical memories and the memory-access bottleneck. Advanced Database Technologies
Characteristics of cache conscious indexes • They should cluster data according to the access pattern; data that are likely to be accessed together (or in sequence) should be close in memory. • They should compress information, so that only useful data are fetched in cache. This means that only comparison keys and reference pointers to searched data should be in the index. • They should not be much larger than the indexed information. Advanced Database Technologies
Why is binary search poor? • If the searched array is large, the number of cache misses is determined by the search comparisons: O(log2n). • This is because from the information fetched in the cache, only one search key will be used. cache-line (128 bytes) Cache: MMem: ... 539 545 568 579 582 589 595 602 609 612 617 623 625 ... current key comparison Advanced Database Technologies
Enhanced Main Memory B+-trees • Although the B+-tree is a secondary memory index, it can be used for search in main memory. • The node size of the tree is set to a multiple of the cache linesize (e.g., 1 node=2 cachelines). • Now the number of cache misses equals the number of tree nodes accessed at search: O(logFn), where F is the fanout of the tree. Advanced Database Technologies
Problems of Main Memory B+-trees • Nodes contain as many pointers as key values. • Many key values can be compared in a node during search. On the other hand, only one pointer will be followed. • Binary search in a node could be expensive (requiring many comparisons) Advanced Database Technologies
12 19 24 31 The Cache Sensitive Search (CSS) tree • Same as B+-tree, but does not store pointers. • The children of each node are stored sequentially, thus pointers are induced by positional memory offsets. CSS-tree B+-tree 24 12 19 31 4 8 9 12 13 17 19 21 23 24 27 29 31 34 38 4 8 9 12 13 17 19 21 23 24 27 29 31 34 38 Advanced Database Technologies
The Cache Sensitive Search (CSS) tree (cont’d) • The CSS tree is suitable only for static data. • The capacity of each node is double the capacity of an B+-tree node. • Thus the height (and search cost) of the tree is reduced. • Another trick used by the CSS tree is hard-coding binary search by if-else statements. Advanced Database Technologies
Hard-coding binary search • Normal binary search • Binsearch(key,C,start,end)= Binsearch(key,C,mid,end) if key>C[mid] Binsearch(key,C,start,mid) if key<C[mid] Follow C[mid] if key=C[mid]. • Augmented binary search • if (key<C[mid]) then • if (key<C[mid/2]) then ... • else if (key>C[mid/2]) then ... • else follow C[mid/2] • else if (key>C[mid]) then • if (key<C[3mid/2]) then ... • else if (key>C[3mid/2]) then ... • else follow C[3mid/2] • else follow C[mid] Advanced Database Technologies
Presentation material • A dynamic version of the CSS-tree: the cache conscious B+-tree • An improved version of the cache conscious B+-tree (optional reading) • Cache conscious R-trees Advanced Database Technologies