Advanced Data Structures NTUA Spring 2007 B+-trees and External memory Hashing

Advanced Data StructuresNTUA Spring 2007B+-trees and External memory Hashing

Model of Computation • Data stored on disk(s) • Minimum transfer unit: a page(or block) = b bytes or B records • N records -> N/B = n pages • I/O complexity: in number of pages CPU Memory Disk

I/O complexity • An ideal index has space O(N/B), update overhead O(1) or O(logB(N/B)) and search complexity O(a/B) or O(logB(N/B) + a/B) where a is the number of records in the answer • But, sometimes CPU performance is also important… minimize cache misses -> don’t waste CPU cycles

B+-treehttp://en.wikipedia.org/wiki/B-tree • Records must be ordered over an attribute, SSN, Name, etc. • Queries: exact match and range queries over the indexed attribute: “find the name of the student with ID=087-34-7892” or “find all students with gpa between 3.00 and 3.5”

B+-tree:properties • Insert/delete at log F (N/B) cost; keep tree height-balanced.(F = fanout) • Minimum 50% occupancy (except for root). Each node contains d <= m <= 2d entries. • Two types of nodes: index (non-leaf) nodes and (leaf) data nodes; each node is stored in 1 page (disk based method) [BM72] Rudolf Bayer and McCreight, E. M. Organization and Maintenance of Large Ordered Indexes. Acta Informatica 1, 173-189, 1972

Example Root 100 120 150 180 30 3 5 11 120 130 180 200 100 101 110 150 156 179 30 35

Index node to keys to keys to keys to keys < 57 57£ k<81 81£k<95 95£ 57 81 95

Data node From non-leaf node to next leaf in sequence 57 81 95 To record with key 57 To record with key 81 To record with key 85

B+tree rules tree of order n (1) All leaves at same lowest level (balanced tree) (2) Pointers in leaves point to records except for “sequence pointer” (3) Number of pointers/keys for B+tree Max Max Min Min ptrs keys ptrs keys Non-leaf (non-root) n n-1 n/2 n/2- 1 Leaf (non-root) n n-1 (n-1)/2 (n-1)/2 Root n n-1 2 1

Insert into B+tree (a) simple case • space available in leaf (b) leaf overflow (c) non-leaf overflow (d) new root

32 n=4 100 (a) Insert key = 32 30 3 5 11 30 31

7 3 5 7 n=4 100 (a) Insert key = 7 30 3 5 11 30 31

160 180 160 179 n=4 100 (c) Insert key = 160 120 150 180 180 200 150 156 179

30 new root 40 40 45 n=4 (d) New root, insert 45 10 20 30 1 2 3 10 12 20 25 30 32 40

Insertion • Find correct leaf L. • Put data entry onto L. • If L has enough space, done! • Else, must splitL (into L and a new node L2) • Redistribute entries evenly, copy upmiddle key. • Insert index entry pointing to L2 into parent of L. • This can happen recursively • To split index node, redistribute entries evenly, but push upmiddle key. (Contrast with leaf splits.) • Splits “grow” tree; root split increases height. • Tree growth: gets wider or one level taller at top.

Deletion from B+tree (a) Simple case - no example (b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf

n=5 (a) Simple case • Delete 30 10 40 100 10 20 30 40 50

40 n=5 (b) Coalesce with sibling • Delete 50 10 40 100 10 20 30 40 50

35 35 n=5 (c) Redistribute keys • Delete 50 10 40 100 10 20 30 35 40 50

new root 40 25 30 (d) Non-leaf coalesce • Delete 37 n=5 25 10 20 30 40 25 26 30 37 1 3 10 14 20 22 40 45

Deletion • Start at root, find leaf L where entry belongs. • Remove the entry. • If L is at least half-full, done! • If L has only d-1 entries, • Try to re-distribute, borrowing from sibling (adjacent node with same parent as L). • If re-distribution fails, mergeL and sibling. • If merge occurred, must delete entry (pointing to L or sibling) from parent of L. • Merge could propagate to root, decreasing height.

Complexity • Optimal method for 1-d range queries: Tree height: logd(N/d) Space: O(N/d) Updates: O(logd(N/d)) Query:O(logd(N/d) + a/d) d = B/2

Example Root 100 Range[32, 160] 120 150 180 30 3 5 11 120 130 180 200 100 101 110 150 156 179 30 35

Other issues • Internal node architecture [Lomet01]: • Reduce the overhead of tree traversal. • Prefix compression: In index nodes store only the prefix that differentiate consecutive sub-trees. Fanout is increased. • Cache sensitive B+-tree • Place keys in a way that reduces the cache faults during the binary search in each node. • Eliminate pointers so a cache line contains more keys for comparison.

References [BM72] Rudolf Bayer and McCreight, E. M. Organization and Maintenance of Large Ordered Indexes. Acta Informatica 1, 173-189, 1972 [L01]David B. Lomet: The Evolution of Effective B-tree: Page Organization and Techniques: A Personal Account. SIGMOD Record 30(3): 64-69 (2001) http://www.acm.org/sigmod/record/issues/0109/a1-lomet.pdf [B-Y95] Ricardo A. Baeza-Yates: Fringe Analysis Revisited. ACM Comput. Surv. 27(1): 111-119 (1995)

Selection Queries B+-tree is perfect, but.... to answer a selection query (ssn=10) needs to traverse a full path. In practice, 3-4 block accesses (depending on the height of the tree, buffering) Any better approach? Yes! Hashing • static hashing • dynamic hashing

Hashing • Hash-based indexes are best for equalityselections. Cannot support range searches. • Static and dynamic hashing techniques exist; trade-offs similar to ISAM vs. B+ trees.

Static Hashing • # primary pages fixed, allocated sequentially, never de-allocated; overflow pages if needed. • h(k) MOD N= bucket to which data entry withkey k belongs. (N = # of buckets) 0 h(key) mod N 1 key h N-1 Primary bucket pages Overflow pages

Static Hashing (Contd.) • Buckets contain data entries. • Hash fn works on search key field of record r. Use its value MOD N to distribute values over range 0 ... N-1. • h(key) = (a * key + b) usually works well. • a and b are constants… more later. • Long overflow chainscan develop and degrade performance. • Extensible and LinearHashing: Dynamic techniques to fix this problem.

Extensible Hashing • Situation: Bucket (primary page) becomes full. Why not re-organize file by doubling # of buckets? • Reading and writing all pages is expensive! • Idea: Use directory of pointers to buckets, double # of buckets by doubling the directory, splitting just the bucket that overflowed! • Directory much smaller than file, so doubling it is much cheaper. Only one page of data entries is split. Nooverflowpage! • Trick lies in how hash function is adjusted!

Example • Directory is array of size 4. • Bucket for record r has entry with index = `global depth’ least significant bits of h(r); • If h(r) = 5 = binary 101, it is in bucket pointed to by 01. • If h(r) = 7 = binary 111, it is in bucket pointed to by 11. 2 LOCAL DEPTH Bucket A 16* 4* 12* 32* GLOBAL DEPTH 2 1 Bucket B 00 5* 1* 7* 13* 01 2 10 Bucket C 10* 11 • we denote r by h(r). DIRECTORY

Handling Inserts • Find bucket where record belongs. • If there’s room, put it there. • Else, if bucket is full, splitit: • increment local depth of original page • allocate new page with new local depth • re-distribute records from original page. • add entry for the new page to the directory

2 16* 4* 12* 32* 2 Bucket D Example: Insert 21, then 19, 15 • 21 = 10101 • 19 = 10011 • 15 = 01111 LOCAL DEPTH Bucket A GLOBAL DEPTH 2 2 1 Bucket B 00 5* 1* 7* 13* 21* 01 2 10 Bucket C 10* 11 DIRECTORY 19* 15* 7* DATA PAGES

20 = 10100 3 3 LOCAL DEPTH 16* 32* 32* 16* GLOBAL DEPTH 3 2 2 16* 4* 12* 32* 1* 5* 21* 13* 000 Bucket B 001 2 010 10* 011 100 2 101 15* 7* 19* Bucket D 110 111 3 3 Bucket A2 4* 12* 20* 12* 20* Bucket A2 4* (`split image' of Bucket A) (`split image' Insert h(r)=20 (Causes Doubling) LOCAL DEPTH Bucket A GLOBAL DEPTH 2 2 Bucket B 5* 21* 13* 1* 00 01 2 10 Bucket C 10* 11 2 Bucket D 15* 7* 19* of Bucket A)

Points to Note • 20 = binary 10100. Last 2 bits (00) tell us r belongs in either A or A2. Last 3 bits needed to tell which. • Global depth of directory:Max # of bits needed to tell which bucket an entry belongs to. • Local depth of a bucket: # of bits used to determine if an entry belongs to this bucket. • When does bucket split cause directory doubling? • Before insert, local depth of bucket = global depth. Insert causes local depth to become > global depth; directory is doubled by copying it overand `fixing’ pointer to split image page.

Linear Hashing • This is another dynamic hashing scheme, alternative to Extensible Hashing. • Motivation: Ext. Hashing uses a directory that grows by doubling… Can we do better? (smoother growth) • LH: split buckets from left to right, regardless of which one overflowed (simple, but it works!!)

Linear Hashing (Contd.) • Directory avoided in LH by using overflow pages. (chaining approach) • Splitting proceeds in `rounds’. Round ends when all NRinitial (for round R) buckets are split. • Current round number is Level. • Search:To find bucket for data entry r, findhLevel(r): • If hLevel(r) in range `Next to NR’, r belongs here. • Else, r could belong to bucket hLevel(r) or bucket hLevel(r) + NR; must apply hLevel+1(r) to find out. • Family of hash functions: h0, h1, h2, h3, …. hi+1 (k) = hi(k) or hi+1 (k) = hi(k) + 2i-1N0

Linear Hashing: Example Initially: h(x) = x mod N (N=4 here) Assume 3 records/bucket Insert 17 = 17 mod 4 1 Bucket id 0 1 2 3 4 85 9 6 7 11 13

Linear Hashing: Example Initially: h(x) = x mod N (N=4 here) Assume 3 records/bucket Insert 17 = 17 mod 4 1 Bucket id 0 1 2 3 4 85 9 6 7 11 Overflow for Bucket 1 13 Split bucket 0, anyway!!

Linear Hashing: Example To split bucket 0, use another function h1(x): h0(x) = x mod N , h1(x) = x mod (2*N) 17 0 1 2 3 4 85 9 6 7 11 Split pointer 13

Linear Hashing: Example To split bucket 0, use another function h1(x): h0(x) = x mod N , h1(x) = x mod (2*N) 17 Bucket id 0 1 2 3 4 8 5 9 6 7 11 4 Split pointer 13

Linear Hashing: Example To split bucket 0, use another function h1(x): h0(x) = x mod N , h1(x) = x mod (2*N) Bucket id 0 1 2 3 4 8 5 9 6 7 11 4 13 17

Linear Hashing: Example h0(x) = x mod N , h1(x) = x mod (2*N) Insert 15 and 3 Bucket id 0 1 2 3 4 8 5 9 6 7 11 4 13 17

Linear Hashing: Example h0(x) = x mod N , h1(x) = x mod (2*N) Bucket id 0 1 2 3 4 5 8 9 6 7 11 4 13 5 17 15 3

Linear Hashing: Search h0(x) = x mod N (for the un-split buckets) h1(x) = x mod (2*N) (for the split ones) Bucket id 0 1 2 3 4 5 8 9 6 7 11 4 13 5 17 15 3

Linear Hashing: Search Algorithm for Search: Search(k) 1 b = h0(k) 2 if b < split-pointer then 3 b = h1(k) 4 read bucket b and search there

References [Litwin80] Witold Litwin: Linear Hashing: A New Tool for File and Table Addressing. VLDB 1980: 212-223 http://www.cs.bu.edu/faculty/gkollios/ada01/Papers/linear-hashing.PDF [B-YS-P98] Ricardo A. Baeza-Yates, Hector Soza-Pollman: Analysis of Linear Hashing Revisited. Nord. J. Comput. 5(1): (1998)

Advanced Data Structures NTUA Spring 2007 B+-trees and External memory Hashing

Advanced Data Structures NTUA Spring 2007 B+-trees and External memory Hashing

Presentation Transcript

ELEC 7770 Advanced VLSI Design Spring 2007 Introduction

Advanced Seminar in Data Structures

Advanced Data Structures

Advanced Data Structures

Advanced Data Structures

Advanced Data Structures

Spring 2007

ELEC 7770 Advanced VLSI Design Spring 2007 Retiming

NTUA

G64ADS Advanced Data Structures

Advanced Tree-Structures

Spring 2007

CSE 326: Data Structures: Advanced Topics

Advanced Data Structures - Binary Heap

Advanced Data Structures

Welcome to Data Structures Spring 2009

Spring 2007

Advanced Python Data Structures

ELEC 7770 Advanced VLSI Design Spring 2007 Retiming

COSC 2007 Data Structures II