540 likes | 622 Views
BTrees & Sorting. 11/3. Indexing. “If you don’t find it in the index, look very carefully through the entire catalog” -- Sears, Roebuck, and Co., Consumers Guide, 1897. Index Motivation. A file contains some records, say products We want faster access to those records
E N D
BTrees & Sorting 11/3
Indexing “If you don’t find it in the index, look very carefully through the entire catalog” -- Sears, Roebuck, and Co., Consumers Guide, 1897
Index Motivation • A file contains some records, say products • We want faster access to those records • E.g., Give me all products made by Sony • Intuition: Build a second file that organizes the records “by product” to make this faster • NB: we don’t always have to build a second file
Indexes • An indexon a file speeds up selections on the search key fields for the index. • Search key properties • Any subset of fields • isnotthe same as key of a relation On which attributes would you build indexes? Product(name, maker, price)
More precisely • An index contains a collection of data entries, and supports efficient retrieval of all data entries k*with a given key value k. Product(name, maker, price) Sample queries? Indexing is one the most important facilities provided by a database for performance
Operations on an Index • Search: Given a key find all records • More sophisticated variants as well. Why? • Insert /Remove entries • Bulk Load. Why? Real difference between structures: costs of ops determines which index you pick and why
Data File with Several Index Files Equality Query: Age = 12 and Sal = 90? Range Query: Age = 5 and Sal > 5? <Age, Sal> Composite keys in Dictionary Order <Sal, Age> <Age> <Sal>
Outline • Btrees • Very good for range queries, sorted data • Some old databases only implemented Btrees • Hash Tables • There are variants of this basic structure to deal with IO The data structures we present here are “IO aware”
B+ Trees • Search trees • B does not mean binary! • Idea in B Trees: • make 1 node = 1 physical page • Balanced, height adjusted tree (not the B either) • Idea in B+ Trees: • Make leaves into a linked list (range queries)
B+ Trees Basics Parameter d = the degree Each node has >= d and <= 2d keys (except root) Internal Nodes Keys k < 30 Keys 240<=k Keys 30<=k<120 Keys 120<=k<240 Leaf Nodes Each leaf has >=d and <= 2d keys: Next leaf 40 50 60
d = 2 B+ Tree Example 1. No data in internal nodes. 2. Links between leaf pages. 10 15 18 20 30 40 50 60 65 80 85 90
Searching a B+ Tree • Exact key values: • Start at the root • Proceed down, to the leaf • Range queries: • As above • Then sequential traversal Select name From people Where age = 25 Select name From people Where 20 <= age and age <= 30
B+ Tree Example 30 < 80. K = 30? 30 in [20,60) To the data! 10 15 18 20 30 40 50 60 65 80 85 90
B+ Tree Example 30 < 80. K in [30,85] 30 in [20,60) Use those leaf pointers! To the data! 10 15 18 20 30 40 50 60 65 80 85 90
B+ Tree Design • How large is d? • Example: • Key size = 4 bytes • Pointer size = 8 bytes • Block size = 4096 byes • 2d x 4 + (2d+1) x 8 <= 4096 • d = 170 Observable Universe contains ≈ 1080 atoms. What is height of a B+tree that indexes it? NB: Oracle allows 64k pages TiB is 240 bytes. What is the height to index with 64k Pages?
B+ Trees in Practice • Typical order: 100. Typical fill-factor: 67%. • average fanout = 133 • Typical capacities: • Height 4: 1334 = 312,900,700 records • Height 3: 1333 = 2,352,637 records • Top levels of tree sit in the buffer pool: • Level 1 = 1 page = 8 Kbytes • Level 2 = 133 pages = 1 Mbyte • Level 3 = 17,689 pages = 133 MBytes Typically, pay for one IO!
Insertion in a B+ Tree Insert (K, P) • Find leaf where K belongs, insert • If no overflow (2d keys or less), halt • If overflow (2d+1 keys), split node, insert in parent: • If leaf, keep K3 too in right node • When root splits, new root has 1 key only (K3, ) to parent
Insertion in a B+ Tree Insert K=19 10 15 18 20 30 40 50 60 65 80 85 90
Insertion in a B+ Tree After insertion 10 15 18 19 20 30 40 50 60 65 80 85 90
Insertion in a B+ Tree Now insert 25 10 15 18 19 20 30 40 50 60 65 80 85 90
Insertion in a B+ Tree After insertion 10 15 18 19 20 25 30 40 50 60 65 80 85 90
Insertion in a B+ Tree But now have to split ! 10 15 18 19 20 25 30 40 50 60 65 80 85 90
Insertion in a B+ Tree After the split 10 15 18 19 20 25 30 40 50 60 65 80 85 90
B+ Tree Indexes Non-leaf Pages Leaf Pages (Sorted by search key) • Leaf pages containdata entries, and are chained (prev & next) • Non-leaf pages have index entries; only used to direct searches: index entry P K P K P P K m 0 1 2 1 m 2
Example B+ Tree Note how data entries in leaf level are sorted Root • Find 28*? 29*? All > 15* and < 30* • Insert/delete: Find data entry in leaf, then change it. Need to adjust parent sometimes. • And change sometimes bubbles up the tree 17 Entries <= 17 Entries > 17 27 5 13 30 39* 2* 3* 5* 7* 8* 22* 24* 27* 29* 38* 33* 34* 14* 16*
Index Classification An index is clustered if the data is ordered the same as the data is laid out.
Clustered vs. Unclustered Index UNCLUSTERED CLUSTERED Index entries direct search (Index File) Data entries Data Records Data Records (Data file) • Clustered (or not) dramatically impacts cost
Operations on an Index Search: Given a key find all records • More sophisticated variants as well. Real difference between structures: costs of ops which index you pick and why
Cost Model for Our Analysis We ignore CPU costs, for simplicity: • B: The number of data pages • R: Number of records per page • D: (Average) time to read or write disk page • Measuring number of page I/O’s • “Average-case” analysis; based on several simplistic assumptions. • Goal: Good enough to show the overall trends!
Clustered v. Unclustered Fanout of Tree is F. Range query for M entries (100 per page) IOs to search for a single item? Clustered Traversal of the tree: logF(1.5N) Range search Query : logF(1.5N) + ceil(M/100) Unclustered Traversal of the tree: logF(1.5N) Range search Query : logF(1.5N) + M Which of these IOs are random/sequential?
Plugging in Some numbers Clustered: logF(1.5N) + ceil(M/100) ~ 1 Random IO (10ms) Unclustered: logF(1.5N) + M Random IO (M*10ms) If M is 1 then there is no difference! If M is 100,000 records, ~10 minutes vs. 10ms
Takeaway • B+Tree are a workhorse index. • You can write down a cost model. • Databases actually do this! • Clustered v. unclustered is a big deal.
Why Sort? • Data requested in sorted order • e.g., find students in increasing GPAorder • Sorting is first step in bulk loading B+ tree index. A classic problem in computer science!
More reasons to sort… • Sorting useful for eliminating duplicate copies in a collection of records (Why?) • Sort-merge join algorithm involves sorting. • Problem: sort 1Tb of data with 1Gb of RAM. • why not use virtual memory?
Do people care? http://sortbenchmark.org Sort benchmark bares his name
Two Ideas behind external sort • I/O optimized: long sequential disk access • Observation: Merging sorted files is easy Sort small sets in memory, then merge.
Phase I: Buffer with 3 Pages Sort 10,33,44,55 Sort it! (Quicksort) Main Memory Phase 1, Per Page 2 IOs (1 Read,1 Write) 44,10,33,55 End: All pages sorted. 18,8,5,30 5,8,18,30
Phase II: Merge 1. Read Pages 10,33,44,55 Main Memory 2. Merge 5,8,18,30 5 5,8,10,18 5,8 10,33,44,55 5,8,18,30
Phase II: Merge 1. Read Pages 10,33,44,55 Main Memory 2. Merge 5,8,18,30 5 5,8,10,18 5,8 10,33,44,55 5,8,18,30
Phase II: Merge 1. Read Pages 10,33,44,55 Main Memory 2. Merge 5,8,18,30 5,8 5,8,10,18 5,8,10 10,33,44,55 5,8,18,30
Phase II: Merge 1. Read Pages 10,33,44,55 Main Memory 2. Merge 5,8,18,30 5,8,10 5,8,10,18 5,8,10,18 3rd Page is filled 10,33,44,55 5,8,18,30
Phase II: Merge 1. Read Pages 10,33,44,55 Main Memory 2. Merge 5,8,18,30 5,8,10,18 5,8,10,18 3. Write Back Keep on Merging! 10,33,44,55 5,8,18,30
3 Buffer Pages Sort Main Memory Now, runs of length 2. If our file has 16 pages, what is next? 5,8,10,18 30,33,44,55
Phase II: Merge 1. Read Pages 10,33,44,55 Main Memory 2. Merge 5,8,18,30 5,8,10,18 5,8,10,18 3. Write Back We now have 10,33,44,55 5,8,18,30