1 / 136

Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 – DB Applications

Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 – DB Applications. C. Faloutsos & A. Pavlo Lecture#9 (R&G ch. 10) Indexing. Outline. Motivation ISAM B-trees (not in book) B+ trees duplicates B+ trees in practice. Introduction. How to support range searches?

rfogle
Download Presentation

Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 – DB Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Carnegie Mellon Univ.Dept. of Computer Science15-415/615 – DB Applications C. Faloutsos & A. Pavlo Lecture#9 (R&G ch. 10) Indexing

  2. Outline • Motivation • ISAM • B-trees (not in book) • B+ trees • duplicates • B+ trees in practice CMU SCS 15-415/615

  3. Introduction • How to support range searches? • equality searches? CMU SCS 15-415/615

  4. Range Searches • ``Find all students with gpa > 3.0’’ • may be slow, even on sorted file • What to do? Data File Page N Page 3 Page 1 Page 2 CMU SCS 15-415/615

  5. Range Searches • ``Find all students with gpa > 3.0’’ • may be slow, even on sorted file • Solution: Create an `index’ file. Index File kN k2 k1 Data File Page N Page 3 Page 1 Page 2 CMU SCS 15-415/615

  6. Range Searches • More details: • if index file is small, do binary search there • Otherwise?? Index File kN k2 k1 Data File Page N Page 3 Page 1 Page 2 CMU SCS 15-415/615

  7. ISAM • Repeat recursively! Non-leaf Pages Leaf Pages CMU SCS 15-415/615

  8. ISAM • OK - what if there are insertions and overflows? Non-leaf Pages Leaf Pages CMU SCS 15-415/615

  9. Overflow page ISAM • Overflow pages, linked to the primary page Non-leaf Pages Leaf Pages Primary pages CMU SCS 15-415/615

  10. Root 40 51 63 20 33 46* 55* 40* 51* 97* 10* 15* 20* 27* 33* 37* 63* Example ISAM Tree • 2 entries per page CMU SCS 15-415/615

  11. Root 40 51 63 20 33 46* 55* 40* 51* 97* 10* 15* 20* 27* 33* 37* 63* ISAM Details • format of an index page? • how full should a newly created ISAM be? CMU SCS 15-415/615

  12. Root 40 51 63 20 33 46* 55* 40* 51* 97* 10* 15* 20* 27* 33* 37* 63* ISAM Details • format of an index page? • how full should a newly created ISAM be? • ~80-90% (not 100%) CMU SCS 15-415/615

  13. Root 40 51 63 20 33 46* 55* 40* 51* 97* 10* 15* 20* 27* 33* 37* 63* ISAM is a STATIC Structure • that is, index pages don’t change • File creation: Leaf (data) pages allocated sequentially, sorted by search key; then index pages allocated, then overflow pgs. CMU SCS 15-415/615

  14. Root 40 51 63 20 33 46* 55* 40* 51* 97* 10* 15* 20* 27* 33* 37* 63* ISAM is a STATIC Structure • Search: Start at root; use key comparisons to go to leaf. • Cost = ?? CMU SCS 15-415/615

  15. Root 40 51 63 20 33 46* 55* 40* 51* 97* 10* 15* 20* 27* 33* 37* 63* ISAM is a STATIC Structure • Search: Start at root; use key comparisons to go to leaf. • Cost = ?? F N CMU SCS 15-415/615

  16. Root 40 51 63 20 33 46* 55* 40* 51* 97* 10* 15* 20* 27* 33* 37* 63* ISAM is a STATIC Structure • Search: Start at root; use key comparisons to go to leaf. • Cost = log F N ; • F = # entries/pg (i.e., fanout), • N = # leaf pgs CMU SCS 15-415/615

  17. Root 40 51 63 20 33 46* 55* 40* 51* 97* 10* 15* 20* 27* 33* 37* 63* ISAM is a STATIC Structure Insert: Find leaf that data entry belongs to, and put it there. Overflow page if necessary. Delete: Find and remove from leaf; if empty page, de-allocate. CMU SCS 15-415/615

  18. 23* 42* Example: Insert 23*, 48*, 41*, 42* Root 40 Index Pages 51 63 20 33 Primary Leaf 46* 55* 40* 51* 97* 10* 15* 20* 27* 33* 37* 63* Pages 41* 48* Overflow Pages CMU SCS 15-415/615

  19. 21* means • <21> + rest of record • (it’s a bit more complicated – but we stay with that, for the moment). • ‘21’ plain means just 4 bytes, to store integer 21 21* 21 21 (name, age, etc) ~record 21 divider CMU SCS 15-415/615

  20. Root 40 Index Pages 51 63 20 33 Primary Leaf 46* 55* 40* 51* 97* 10* 15* 20* 27* 33* 37* 63* Pages 41* 48* 23* Overflow Pages 42* ... then delete 42*, 51*, 97* • Note that 51 appears in index levels, but not in leaf! CMU SCS 15-415/615

  21. ISAM ---- Issues? • Pros • ???? • Cons • ???? CMU SCS 15-415/615

  22. Outline • Motivation • ISAM • B-trees (not in book) • B+ trees • duplicates • B+ trees in practice CMU SCS 15-415/615

  23. B-trees • the most successful family of index schemes (B-trees, B+-trees, B*-trees) • Can be used for primary/secondary, clustering/non-clustering index. • balanced “n-way” search trees CMU SCS 15-415/615

  24. B-trees [Rudolf Bayer and McCreight, E. M. Organization and Maintenance of Large Ordered Indexes. Acta Informatica 1, 173-189, 1972.] CMU SCS 15-415/615

  25. 6 9 <6 >9 <9 >6 3 1 7 13 B-trees Eg., B-tree of order d=1: CMU SCS 15-415/615

  26. pn+1 p1 … vn v1 v2 B - tree properties: • each node, in a B-tree of order d: • Key order • at most n=2d keys • at least d keys (except root, which may have just 1 key) • all leaves at the same level • if number of pointers is k, then node has exactly k-1 keys • (leaves are empty) CMU SCS 15-415/615

  27. Properties • “block aware” nodes: each node -> disk page • O(log (N)) for everything! (ins/del/search) • typically, if d = 50 - 100, then 2 - 3 levels • utilization >= 50%, guaranteed; on average 69% CMU SCS 15-415/615

  28. 6 9 <6 >9 <9 >6 3 1 7 13 Queries • Algo for exact match query? (eg., ssn=8?) CMU SCS 15-415/615

  29. JAVA animation! http://slady.net/java/bt/ strongly recommended! (with all usual pre-cautions – VM etc) CMU SCS 15-415/615

  30. Queries • Algo for exact match query? (eg., ssn=8?) 6 9 <6 >9 <9 >6 3 1 7 13 CMU SCS 15-415/615

  31. Queries • Algo for exact match query? (eg., ssn=8?) 6 9 <6 >9 <9 >6 3 1 7 13 CMU SCS 15-415/615

  32. Queries • Algo for exact match query? (eg., ssn=8?) 6 9 <6 >9 <9 >6 3 1 7 13 CMU SCS 15-415/615

  33. Queries • Algo for exact match query? (eg., ssn=8?) 6 9 <6 H steps (= disk accesses) >9 <9 >6 3 1 7 13 CMU SCS 15-415/615

  34. Queries • what about range queries? (eg., 5<salary<8) • Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) CMU SCS 15-415/615

  35. 6 9 <6 >9 <9 >6 3 1 7 13 Queries • what about range queries? (eg., 5<salary<8) • Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) CMU SCS 15-415/615

  36. 6 9 <6 >9 <9 >6 3 1 7 13 Queries • what about range queries? (eg., 5<salary<8) • Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) CMU SCS 15-415/615

  37. 6 9 <6 >9 <9 >6 3 1 7 13 Queries • what about range queries? (eg., 5<salary<8) • Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) CMU SCS 15-415/615

  38. 6 9 <6 >9 <9 >6 3 1 7 13 Queries • what about range queries? (eg., 5<salary<8) • Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) CMU SCS 15-415/615

  39. B-trees: Insertion • Insert in leaf; on overflow, push middle up (recursively) • split: preserves B - tree properties CMU SCS 15-415/615

  40. 6 9 <6 >9 <9 >6 3 1 7 13 B-trees Easy case: Tree T0; insert ‘8’ CMU SCS 15-415/615

  41. 6 9 <6 >9 <9 >6 3 1 7 13 B-trees Tree T0; insert ‘8’ 8 CMU SCS 15-415/615

  42. 6 9 <6 >9 <9 >6 3 1 7 13 B-trees Hardest case: Tree T0; insert ‘2’ 2 CMU SCS 15-415/615

  43. 13 B-trees Hardest case: Tree T0; insert ‘2’ 6 9 2 1 3 7 push middle up CMU SCS 15-415/615

  44. 2 13 1 3 B-trees Hardest case: Tree T0; insert ‘2’ Ovf; push middle 2 6 9 7 CMU SCS 15-415/615

  45. 13 1 3 B-trees Hardest case: Tree T0; insert ‘2’ 6 Final state 9 2 7 CMU SCS 15-415/615

  46. B-trees: Insertion • Insert in leaf; on overflow, push middle up (recursively – ‘propagate split’) • split: preserves all B - tree properties (!!) • notice how it grows: height increases when root overflows & splits • Automatic, incremental re-organization (contrast with ISAM!) CMU SCS 15-415/615

  47. Pseudo-code INSERTION OF KEY ’K’ find the correct leaf node ’L’; if ( ’L’ overflows ){ split ’L’, and push middle key to parent node ’P’; if (’P’ overflows){ repeat the split recursively; } else{ add the key ’K’ in node ’L’; /* maintaining the key order in ’L’ */ } CMU SCS 15-415/615

  48. Overview • ... • B – trees • Dfn, Search, insertion, deletion • ... CMU SCS 15-415/615

  49. Deletion Rough outline of algo: • Delete key; • on underflow, may need to merge In practice, some implementors just allow underflows to happen… CMU SCS 15-415/615

  50. 6 9 <6 >9 <9 >6 3 1 7 13 B-trees – Deletion Easiest case: Tree T0; delete ‘3’ CMU SCS 15-415/615

More Related