1.36k likes | 1.37k Views
Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 – DB Applications. C. Faloutsos & A. Pavlo Lecture#9 (R&G ch. 10) Indexing. Outline. Motivation ISAM B-trees (not in book) B+ trees duplicates B+ trees in practice. Introduction. How to support range searches?
E N D
Carnegie Mellon Univ.Dept. of Computer Science15-415/615 – DB Applications C. Faloutsos & A. Pavlo Lecture#9 (R&G ch. 10) Indexing
Outline • Motivation • ISAM • B-trees (not in book) • B+ trees • duplicates • B+ trees in practice CMU SCS 15-415/615
Introduction • How to support range searches? • equality searches? CMU SCS 15-415/615
Range Searches • ``Find all students with gpa > 3.0’’ • may be slow, even on sorted file • What to do? Data File Page N Page 3 Page 1 Page 2 CMU SCS 15-415/615
Range Searches • ``Find all students with gpa > 3.0’’ • may be slow, even on sorted file • Solution: Create an `index’ file. Index File kN k2 k1 Data File Page N Page 3 Page 1 Page 2 CMU SCS 15-415/615
Range Searches • More details: • if index file is small, do binary search there • Otherwise?? Index File kN k2 k1 Data File Page N Page 3 Page 1 Page 2 CMU SCS 15-415/615
ISAM • Repeat recursively! Non-leaf Pages Leaf Pages CMU SCS 15-415/615
ISAM • OK - what if there are insertions and overflows? Non-leaf Pages Leaf Pages CMU SCS 15-415/615
Overflow page ISAM • Overflow pages, linked to the primary page Non-leaf Pages Leaf Pages Primary pages CMU SCS 15-415/615
Root 40 51 63 20 33 46* 55* 40* 51* 97* 10* 15* 20* 27* 33* 37* 63* Example ISAM Tree • 2 entries per page CMU SCS 15-415/615
Root 40 51 63 20 33 46* 55* 40* 51* 97* 10* 15* 20* 27* 33* 37* 63* ISAM Details • format of an index page? • how full should a newly created ISAM be? CMU SCS 15-415/615
Root 40 51 63 20 33 46* 55* 40* 51* 97* 10* 15* 20* 27* 33* 37* 63* ISAM Details • format of an index page? • how full should a newly created ISAM be? • ~80-90% (not 100%) CMU SCS 15-415/615
Root 40 51 63 20 33 46* 55* 40* 51* 97* 10* 15* 20* 27* 33* 37* 63* ISAM is a STATIC Structure • that is, index pages don’t change • File creation: Leaf (data) pages allocated sequentially, sorted by search key; then index pages allocated, then overflow pgs. CMU SCS 15-415/615
Root 40 51 63 20 33 46* 55* 40* 51* 97* 10* 15* 20* 27* 33* 37* 63* ISAM is a STATIC Structure • Search: Start at root; use key comparisons to go to leaf. • Cost = ?? CMU SCS 15-415/615
Root 40 51 63 20 33 46* 55* 40* 51* 97* 10* 15* 20* 27* 33* 37* 63* ISAM is a STATIC Structure • Search: Start at root; use key comparisons to go to leaf. • Cost = ?? F N CMU SCS 15-415/615
Root 40 51 63 20 33 46* 55* 40* 51* 97* 10* 15* 20* 27* 33* 37* 63* ISAM is a STATIC Structure • Search: Start at root; use key comparisons to go to leaf. • Cost = log F N ; • F = # entries/pg (i.e., fanout), • N = # leaf pgs CMU SCS 15-415/615
Root 40 51 63 20 33 46* 55* 40* 51* 97* 10* 15* 20* 27* 33* 37* 63* ISAM is a STATIC Structure Insert: Find leaf that data entry belongs to, and put it there. Overflow page if necessary. Delete: Find and remove from leaf; if empty page, de-allocate. CMU SCS 15-415/615
23* 42* Example: Insert 23*, 48*, 41*, 42* Root 40 Index Pages 51 63 20 33 Primary Leaf 46* 55* 40* 51* 97* 10* 15* 20* 27* 33* 37* 63* Pages 41* 48* Overflow Pages CMU SCS 15-415/615
21* means • <21> + rest of record • (it’s a bit more complicated – but we stay with that, for the moment). • ‘21’ plain means just 4 bytes, to store integer 21 21* 21 21 (name, age, etc) ~record 21 divider CMU SCS 15-415/615
Root 40 Index Pages 51 63 20 33 Primary Leaf 46* 55* 40* 51* 97* 10* 15* 20* 27* 33* 37* 63* Pages 41* 48* 23* Overflow Pages 42* ... then delete 42*, 51*, 97* • Note that 51 appears in index levels, but not in leaf! CMU SCS 15-415/615
ISAM ---- Issues? • Pros • ???? • Cons • ???? CMU SCS 15-415/615
Outline • Motivation • ISAM • B-trees (not in book) • B+ trees • duplicates • B+ trees in practice CMU SCS 15-415/615
B-trees • the most successful family of index schemes (B-trees, B+-trees, B*-trees) • Can be used for primary/secondary, clustering/non-clustering index. • balanced “n-way” search trees CMU SCS 15-415/615
B-trees [Rudolf Bayer and McCreight, E. M. Organization and Maintenance of Large Ordered Indexes. Acta Informatica 1, 173-189, 1972.] CMU SCS 15-415/615
6 9 <6 >9 <9 >6 3 1 7 13 B-trees Eg., B-tree of order d=1: CMU SCS 15-415/615
pn+1 p1 … vn v1 v2 B - tree properties: • each node, in a B-tree of order d: • Key order • at most n=2d keys • at least d keys (except root, which may have just 1 key) • all leaves at the same level • if number of pointers is k, then node has exactly k-1 keys • (leaves are empty) CMU SCS 15-415/615
Properties • “block aware” nodes: each node -> disk page • O(log (N)) for everything! (ins/del/search) • typically, if d = 50 - 100, then 2 - 3 levels • utilization >= 50%, guaranteed; on average 69% CMU SCS 15-415/615
6 9 <6 >9 <9 >6 3 1 7 13 Queries • Algo for exact match query? (eg., ssn=8?) CMU SCS 15-415/615
JAVA animation! http://slady.net/java/bt/ strongly recommended! (with all usual pre-cautions – VM etc) CMU SCS 15-415/615
Queries • Algo for exact match query? (eg., ssn=8?) 6 9 <6 >9 <9 >6 3 1 7 13 CMU SCS 15-415/615
Queries • Algo for exact match query? (eg., ssn=8?) 6 9 <6 >9 <9 >6 3 1 7 13 CMU SCS 15-415/615
Queries • Algo for exact match query? (eg., ssn=8?) 6 9 <6 >9 <9 >6 3 1 7 13 CMU SCS 15-415/615
Queries • Algo for exact match query? (eg., ssn=8?) 6 9 <6 H steps (= disk accesses) >9 <9 >6 3 1 7 13 CMU SCS 15-415/615
Queries • what about range queries? (eg., 5<salary<8) • Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) CMU SCS 15-415/615
6 9 <6 >9 <9 >6 3 1 7 13 Queries • what about range queries? (eg., 5<salary<8) • Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) CMU SCS 15-415/615
6 9 <6 >9 <9 >6 3 1 7 13 Queries • what about range queries? (eg., 5<salary<8) • Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) CMU SCS 15-415/615
6 9 <6 >9 <9 >6 3 1 7 13 Queries • what about range queries? (eg., 5<salary<8) • Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) CMU SCS 15-415/615
6 9 <6 >9 <9 >6 3 1 7 13 Queries • what about range queries? (eg., 5<salary<8) • Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) CMU SCS 15-415/615
B-trees: Insertion • Insert in leaf; on overflow, push middle up (recursively) • split: preserves B - tree properties CMU SCS 15-415/615
6 9 <6 >9 <9 >6 3 1 7 13 B-trees Easy case: Tree T0; insert ‘8’ CMU SCS 15-415/615
6 9 <6 >9 <9 >6 3 1 7 13 B-trees Tree T0; insert ‘8’ 8 CMU SCS 15-415/615
6 9 <6 >9 <9 >6 3 1 7 13 B-trees Hardest case: Tree T0; insert ‘2’ 2 CMU SCS 15-415/615
13 B-trees Hardest case: Tree T0; insert ‘2’ 6 9 2 1 3 7 push middle up CMU SCS 15-415/615
2 13 1 3 B-trees Hardest case: Tree T0; insert ‘2’ Ovf; push middle 2 6 9 7 CMU SCS 15-415/615
13 1 3 B-trees Hardest case: Tree T0; insert ‘2’ 6 Final state 9 2 7 CMU SCS 15-415/615
B-trees: Insertion • Insert in leaf; on overflow, push middle up (recursively – ‘propagate split’) • split: preserves all B - tree properties (!!) • notice how it grows: height increases when root overflows & splits • Automatic, incremental re-organization (contrast with ISAM!) CMU SCS 15-415/615
Pseudo-code INSERTION OF KEY ’K’ find the correct leaf node ’L’; if ( ’L’ overflows ){ split ’L’, and push middle key to parent node ’P’; if (’P’ overflows){ repeat the split recursively; } else{ add the key ’K’ in node ’L’; /* maintaining the key order in ’L’ */ } CMU SCS 15-415/615
Overview • ... • B – trees • Dfn, Search, insertion, deletion • ... CMU SCS 15-415/615
Deletion Rough outline of algo: • Delete key; • on underflow, may need to merge In practice, some implementors just allow underflows to happen… CMU SCS 15-415/615
6 9 <6 >9 <9 >6 3 1 7 13 B-trees – Deletion Easiest case: Tree T0; delete ‘3’ CMU SCS 15-415/615