470 likes | 606 Views
Advanced Data Structures NTUA Spring 2007 B+-trees and External memory Hashing. Model of Computation. Data stored on disk(s) Minimum transfer unit: a page(or block) = b bytes or B records N records -> N/B = n pages I/O complexity: in number of pages. CPU. Memory. Disk.
E N D
Advanced Data StructuresNTUA Spring 2007B+-trees and External memory Hashing
Model of Computation • Data stored on disk(s) • Minimum transfer unit: a page(or block) = b bytes or B records • N records -> N/B = n pages • I/O complexity: in number of pages CPU Memory Disk
I/O complexity • An ideal index has space O(N/B), update overhead O(1) or O(logB(N/B)) and search complexity O(a/B) or O(logB(N/B) + a/B) where a is the number of records in the answer • But, sometimes CPU performance is also important… minimize cache misses -> don’t waste CPU cycles
B+-treehttp://en.wikipedia.org/wiki/B-tree • Records must be ordered over an attribute, SSN, Name, etc. • Queries: exact match and range queries over the indexed attribute: “find the name of the student with ID=087-34-7892” or “find all students with gpa between 3.00 and 3.5”
B+-tree:properties • Insert/delete at log F (N/B) cost; keep tree height-balanced.(F = fanout) • Minimum 50% occupancy (except for root). Each node contains d <= m <= 2d entries. • Two types of nodes: index (non-leaf) nodes and (leaf) data nodes; each node is stored in 1 page (disk based method) [BM72] Rudolf Bayer and McCreight, E. M. Organization and Maintenance of Large Ordered Indexes. Acta Informatica 1, 173-189, 1972
Example Root 100 120 150 180 30 3 5 11 120 130 180 200 100 101 110 150 156 179 30 35
Index node to keys to keys to keys to keys < 57 57£ k<81 81£k<95 95£ 57 81 95
Data node From non-leaf node to next leaf in sequence 57 81 95 To record with key 57 To record with key 81 To record with key 85
B+tree rules tree of order n (1) All leaves at same lowest level (balanced tree) (2) Pointers in leaves point to records except for “sequence pointer” (3) Number of pointers/keys for B+tree Max Max Min Min ptrs keys ptrs keys Non-leaf (non-root) n n-1 n/2 n/2- 1 Leaf (non-root) n n-1 (n-1)/2 (n-1)/2 Root n n-1 2 1
Insert into B+tree (a) simple case • space available in leaf (b) leaf overflow (c) non-leaf overflow (d) new root
32 n=4 100 (a) Insert key = 32 30 3 5 11 30 31
7 3 5 7 n=4 100 (a) Insert key = 7 30 3 5 11 30 31
160 180 160 179 n=4 100 (c) Insert key = 160 120 150 180 180 200 150 156 179
30 new root 40 40 45 n=4 (d) New root, insert 45 10 20 30 1 2 3 10 12 20 25 30 32 40
Insertion • Find correct leaf L. • Put data entry onto L. • If L has enough space, done! • Else, must splitL (into L and a new node L2) • Redistribute entries evenly, copy upmiddle key. • Insert index entry pointing to L2 into parent of L. • This can happen recursively • To split index node, redistribute entries evenly, but push upmiddle key. (Contrast with leaf splits.) • Splits “grow” tree; root split increases height. • Tree growth: gets wider or one level taller at top.
Deletion from B+tree (a) Simple case - no example (b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf
n=5 (a) Simple case • Delete 30 10 40 100 10 20 30 40 50
40 n=5 (b) Coalesce with sibling • Delete 50 10 40 100 10 20 30 40 50
35 35 n=5 (c) Redistribute keys • Delete 50 10 40 100 10 20 30 35 40 50
new root 40 25 30 (d) Non-leaf coalesce • Delete 37 n=5 25 10 20 30 40 25 26 30 37 1 3 10 14 20 22 40 45
Deletion • Start at root, find leaf L where entry belongs. • Remove the entry. • If L is at least half-full, done! • If L has only d-1 entries, • Try to re-distribute, borrowing from sibling (adjacent node with same parent as L). • If re-distribution fails, mergeL and sibling. • If merge occurred, must delete entry (pointing to L or sibling) from parent of L. • Merge could propagate to root, decreasing height.
Complexity • Optimal method for 1-d range queries: Tree height: logd(N/d) Space: O(N/d) Updates: O(logd(N/d)) Query:O(logd(N/d) + a/d) d = B/2
Example Root 100 Range[32, 160] 120 150 180 30 3 5 11 120 130 180 200 100 101 110 150 156 179 30 35
Other issues • Internal node architecture [Lomet01]: • Reduce the overhead of tree traversal. • Prefix compression: In index nodes store only the prefix that differentiate consecutive sub-trees. Fanout is increased. • Cache sensitive B+-tree • Place keys in a way that reduces the cache faults during the binary search in each node. • Eliminate pointers so a cache line contains more keys for comparison.
References [BM72] Rudolf Bayer and McCreight, E. M. Organization and Maintenance of Large Ordered Indexes. Acta Informatica 1, 173-189, 1972 [L01]David B. Lomet: The Evolution of Effective B-tree: Page Organization and Techniques: A Personal Account. SIGMOD Record 30(3): 64-69 (2001) http://www.acm.org/sigmod/record/issues/0109/a1-lomet.pdf [B-Y95] Ricardo A. Baeza-Yates: Fringe Analysis Revisited. ACM Comput. Surv. 27(1): 111-119 (1995)
Selection Queries B+-tree is perfect, but.... to answer a selection query (ssn=10) needs to traverse a full path. In practice, 3-4 block accesses (depending on the height of the tree, buffering) Any better approach? Yes! Hashing • static hashing • dynamic hashing
Hashing • Hash-based indexes are best for equalityselections. Cannot support range searches. • Static and dynamic hashing techniques exist; trade-offs similar to ISAM vs. B+ trees.
Static Hashing • # primary pages fixed, allocated sequentially, never de-allocated; overflow pages if needed. • h(k) MOD N= bucket to which data entry withkey k belongs. (N = # of buckets) 0 h(key) mod N 1 key h N-1 Primary bucket pages Overflow pages
Static Hashing (Contd.) • Buckets contain data entries. • Hash fn works on search key field of record r. Use its value MOD N to distribute values over range 0 ... N-1. • h(key) = (a * key + b) usually works well. • a and b are constants… more later. • Long overflow chainscan develop and degrade performance. • Extensible and LinearHashing: Dynamic techniques to fix this problem.
Extensible Hashing • Situation: Bucket (primary page) becomes full. Why not re-organize file by doubling # of buckets? • Reading and writing all pages is expensive! • Idea: Use directory of pointers to buckets, double # of buckets by doubling the directory, splitting just the bucket that overflowed! • Directory much smaller than file, so doubling it is much cheaper. Only one page of data entries is split. Nooverflowpage! • Trick lies in how hash function is adjusted!
Example • Directory is array of size 4. • Bucket for record r has entry with index = `global depth’ least significant bits of h(r); • If h(r) = 5 = binary 101, it is in bucket pointed to by 01. • If h(r) = 7 = binary 111, it is in bucket pointed to by 11. 2 LOCAL DEPTH Bucket A 16* 4* 12* 32* GLOBAL DEPTH 2 1 Bucket B 00 5* 1* 7* 13* 01 2 10 Bucket C 10* 11 • we denote r by h(r). DIRECTORY
Handling Inserts • Find bucket where record belongs. • If there’s room, put it there. • Else, if bucket is full, splitit: • increment local depth of original page • allocate new page with new local depth • re-distribute records from original page. • add entry for the new page to the directory
2 16* 4* 12* 32* 2 Bucket D Example: Insert 21, then 19, 15 • 21 = 10101 • 19 = 10011 • 15 = 01111 LOCAL DEPTH Bucket A GLOBAL DEPTH 2 2 1 Bucket B 00 5* 1* 7* 13* 21* 01 2 10 Bucket C 10* 11 DIRECTORY 19* 15* 7* DATA PAGES
20 = 10100 3 3 LOCAL DEPTH 16* 32* 32* 16* GLOBAL DEPTH 3 2 2 16* 4* 12* 32* 1* 5* 21* 13* 000 Bucket B 001 2 010 10* 011 100 2 101 15* 7* 19* Bucket D 110 111 3 3 Bucket A2 4* 12* 20* 12* 20* Bucket A2 4* (`split image' of Bucket A) (`split image' Insert h(r)=20 (Causes Doubling) LOCAL DEPTH Bucket A GLOBAL DEPTH 2 2 Bucket B 5* 21* 13* 1* 00 01 2 10 Bucket C 10* 11 2 Bucket D 15* 7* 19* of Bucket A)
Points to Note • 20 = binary 10100. Last 2 bits (00) tell us r belongs in either A or A2. Last 3 bits needed to tell which. • Global depth of directory:Max # of bits needed to tell which bucket an entry belongs to. • Local depth of a bucket: # of bits used to determine if an entry belongs to this bucket. • When does bucket split cause directory doubling? • Before insert, local depth of bucket = global depth. Insert causes local depth to become > global depth; directory is doubled by copying it overand `fixing’ pointer to split image page.
Linear Hashing • This is another dynamic hashing scheme, alternative to Extensible Hashing. • Motivation: Ext. Hashing uses a directory that grows by doubling… Can we do better? (smoother growth) • LH: split buckets from left to right, regardless of which one overflowed (simple, but it works!!)
Linear Hashing (Contd.) • Directory avoided in LH by using overflow pages. (chaining approach) • Splitting proceeds in `rounds’. Round ends when all NRinitial (for round R) buckets are split. • Current round number is Level. • Search:To find bucket for data entry r, findhLevel(r): • If hLevel(r) in range `Next to NR’, r belongs here. • Else, r could belong to bucket hLevel(r) or bucket hLevel(r) + NR; must apply hLevel+1(r) to find out. • Family of hash functions: h0, h1, h2, h3, …. hi+1 (k) = hi(k) or hi+1 (k) = hi(k) + 2i-1N0
Linear Hashing: Example Initially: h(x) = x mod N (N=4 here) Assume 3 records/bucket Insert 17 = 17 mod 4 1 Bucket id 0 1 2 3 4 85 9 6 7 11 13
Linear Hashing: Example Initially: h(x) = x mod N (N=4 here) Assume 3 records/bucket Insert 17 = 17 mod 4 1 Bucket id 0 1 2 3 4 85 9 6 7 11 Overflow for Bucket 1 13 Split bucket 0, anyway!!
Linear Hashing: Example To split bucket 0, use another function h1(x): h0(x) = x mod N , h1(x) = x mod (2*N) 17 0 1 2 3 4 85 9 6 7 11 Split pointer 13
Linear Hashing: Example To split bucket 0, use another function h1(x): h0(x) = x mod N , h1(x) = x mod (2*N) 17 Bucket id 0 1 2 3 4 8 5 9 6 7 11 4 Split pointer 13
Linear Hashing: Example To split bucket 0, use another function h1(x): h0(x) = x mod N , h1(x) = x mod (2*N) Bucket id 0 1 2 3 4 8 5 9 6 7 11 4 13 17
Linear Hashing: Example h0(x) = x mod N , h1(x) = x mod (2*N) Insert 15 and 3 Bucket id 0 1 2 3 4 8 5 9 6 7 11 4 13 17
Linear Hashing: Example h0(x) = x mod N , h1(x) = x mod (2*N) Bucket id 0 1 2 3 4 5 8 9 6 7 11 4 13 5 17 15 3
Linear Hashing: Search h0(x) = x mod N (for the un-split buckets) h1(x) = x mod (2*N) (for the split ones) Bucket id 0 1 2 3 4 5 8 9 6 7 11 4 13 5 17 15 3
Linear Hashing: Search Algorithm for Search: Search(k) 1 b = h0(k) 2 if b < split-pointer then 3 b = h1(k) 4 read bucket b and search there
References [Litwin80] Witold Litwin: Linear Hashing: A New Tool for File and Table Addressing. VLDB 1980: 212-223 http://www.cs.bu.edu/faculty/gkollios/ada01/Papers/linear-hashing.PDF [B-YS-P98] Ricardo A. Baeza-Yates, Hector Soza-Pollman: Analysis of Linear Hashing Revisited. Nord. J. Comput. 5(1): (1998)