1 / 128

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications. Lecture#9: Indexing (R&G ch. 10). Outline. Motivation ISAM B-trees (not in book) B+ trees duplicates B+ trees in practice. Introduction. How to support range searches? equality searches?. Range Searches.

brandi
Download Presentation

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Carnegie Mellon Univ.Dept. of Computer Science15-415 - Database Applications Lecture#9: Indexing (R&G ch. 10)

  2. Outline • Motivation • ISAM • B-trees (not in book) • B+ trees • duplicates • B+ trees in practice CMU SCS 15-415

  3. Introduction • How to support range searches? • equality searches? CMU SCS 15-415

  4. Range Searches • ``Find all students with gpa > 3.0’’ • may be slow, even on sorted file • What to do? Data File Page N Page 3 Page 1 Page 2 CMU SCS 15-415

  5. Range Searches • ``Find all students with gpa > 3.0’’ • may be slow, even on sorted file • Solution: Create an `index’ file. Index File kN k2 k1 Data File Page N Page 3 Page 1 Page 2 CMU SCS 15-415

  6. Range Searches • More details: • if index file is small, do binary search there • Otherwise?? Index File kN k2 k1 Data File Page N Page 3 Page 1 Page 2 CMU SCS 15-415

  7. ISAM • Repeat recursively! Non-leaf Pages Leaf Pages CMU SCS 15-415

  8. ISAM • OK - what if there are insertions and overflows? Non-leaf Pages Leaf Pages CMU SCS 15-415

  9. Overflow page ISAM • Overflow pages, linked to the primary page Non-leaf Pages Leaf Pages Primary pages CMU SCS 15-415

  10. Root 40 51 63 20 33 46* 55* 40* 51* 97* 10* 15* 20* 27* 33* 37* 63* Example ISAM Tree • 2 entries per page CMU SCS 15-415

  11. ISAM Details • format of an index page? • how full would a newly created ISAM be? CMU SCS 15-415

  12. P K P P K P K m 2 0 1 m 1 2 ISAM Details • format of an index page? • how full would a newly created ISAM be? • ~80-90% (not 100%) CMU SCS 15-415

  13. ISAM is a STATIC Structure • that is, index pages don’t change • File creation: Leaf (data) pages allocated sequentially, sorted by search key; then index pages allocated, then overflow pgs. CMU SCS 15-415

  14. ISAM is a STATIC Structure • Search: Start at root; use key comparisons to go to leaf. • Cost = log F N ; • F = # entries/pg (i.e., fanout), • N = # leaf pgs CMU SCS 15-415

  15. ISAM is a STATIC Structure Insert: Find leaf that data entry belongs to, and put it there. Overflow page if necessary. Delete: Find and remove from leaf; if empty page, de-allocate. CMU SCS 15-415

  16. 48* 23* 42* Example: Insert 23*, 48*, 41*, 42* Root 40 Index Pages 51 63 20 33 Primary Leaf 46* 55* 40* 51* 97* 10* 15* 20* 27* 33* 37* 63* Pages 41* Overflow Pages CMU SCS 15-415

  17. Root 40 Index Pages 51 63 20 33 Primary Leaf 46* 55* 40* 51* 97* 10* 15* 20* 27* 33* 37* 63* Pages 41* 48* 23* Overflow Pages 42* ... then delete 42*, 51*, 97* • Note that 51* appears in index levels, but not in leaf! CMU SCS 15-415

  18. ISAM ---- Issues? • Pros • ???? • Cons • ???? CMU SCS 15-415

  19. Outline • Motivation • ISAM • B-trees (not in book) • B+ trees • duplicates • B+ trees in practice CMU SCS 15-415

  20. B-trees • the most successful family of index schemes (B-trees, B+-trees, B*-trees) • Can be used for primary/secondary, clustering/non-clustering index. • balanced “n-way” search trees CMU SCS 15-415

  21. B-trees [Rudolf Bayer and McCreight, E. M. Organization and Maintenance of Large Ordered Indexes. Acta Informatica 1, 173-189, 1972.] CMU SCS 15-415

  22. 6 9 <6 >9 <9 >6 3 1 7 13 B-trees Eg., B-tree of order d=1: CMU SCS 15-415

  23. pn p1 … vn-1 v1 v2 B - tree properties: • each node, in a B-tree of order d: • Key order • at most n=2d keys • at least d keys (except root, which may have just 1 key) • all leaves at the same level • if number of pointers is k, then node has exactly k-1 keys • (leaves are empty) CMU SCS 15-415

  24. Properties • “block aware” nodes: each node -> disk page • O(log (N)) for everything! (ins/del/search) • typically, if d = 50 - 100, then 2 - 3 levels • utilization >= 50%, guaranteed; on average 69% CMU SCS 15-415

  25. 6 9 <6 >9 <9 >6 3 1 7 13 Queries • Algo for exact match query? (eg., ssn=8?) CMU SCS 15-415

  26. JAVA animation! http://slady.net/java/bt/ strongly recommended! CMU SCS 15-415

  27. Queries • Algo for exact match query? (eg., ssn=8?) 6 9 <6 >9 <9 >6 3 1 7 13 CMU SCS 15-415

  28. Queries • Algo for exact match query? (eg., ssn=8?) 6 9 <6 >9 <9 >6 3 1 7 13 CMU SCS 15-415

  29. Queries • Algo for exact match query? (eg., ssn=8?) 6 9 <6 >9 <9 >6 3 1 7 13 CMU SCS 15-415

  30. Queries • Algo for exact match query? (eg., ssn=8?) 6 9 <6 H steps (= disk accesses) >9 <9 >6 3 1 7 13 CMU SCS 15-415

  31. Queries • what about range queries? (eg., 5<salary<8) • Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) CMU SCS 15-415

  32. 6 9 <6 >9 <9 >6 3 1 7 13 Queries • what about range queries? (eg., 5<salary<8) • Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) CMU SCS 15-415

  33. 6 9 <6 >9 <9 >6 3 1 7 13 Queries • what about range queries? (eg., 5<salary<8) • Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) CMU SCS 15-415

  34. 6 9 <6 >9 <9 >6 3 1 7 13 Queries • what about range queries? (eg., 5<salary<8) • Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) CMU SCS 15-415

  35. 6 9 <6 >9 <9 >6 3 1 7 13 Queries • what about range queries? (eg., 5<salary<8) • Proximity/ nearest neighbor searches? (eg., salary ~ 8 ) CMU SCS 15-415

  36. B-trees: Insertion • Insert in leaf; on overflow, push middle up (recursively) • split: preserves B - tree properties CMU SCS 15-415

  37. 6 9 <6 >9 <9 >6 3 1 7 13 B-trees Easy case: Tree T0; insert ‘8’ CMU SCS 15-415

  38. 6 9 <6 >9 <9 >6 3 1 7 13 B-trees Tree T0; insert ‘8’ 8 CMU SCS 15-415

  39. 6 9 <6 >9 <9 >6 3 1 7 13 B-trees Hardest case: Tree T0; insert ‘2’ 2 CMU SCS 15-415

  40. 13 B-trees Hardest case: Tree T0; insert ‘2’ 6 9 2 1 3 7 push middle up CMU SCS 15-415

  41. 2 13 1 3 B-trees Hardest case: Tree T0; insert ‘2’ Ovf; push middle 2 6 9 7 CMU SCS 15-415

  42. 13 1 3 B-trees Hardest case: Tree T0; insert ‘2’ 6 Final state 9 2 7 CMU SCS 15-415

  43. B-trees: Insertion • Insert in leaf; on overflow, push middle up (recursively – ‘propagate split’) • split: preserves all B - tree properties (!!) • notice how it grows: height increases when root overflows & splits • Automatic, incremental re-organization (contrast with ISAM!) CMU SCS 15-415

  44. Pseudo-code INSERTION OF KEY ’K’ find the correct leaf node ’L’; if ( ’L’ overflows ){ split ’L’, and push middle key to parent node ’P’; if (’P’ overflows){ repeat the split recursively; } else{ add the key ’K’ in node ’L’; /* maintaining the key order in ’L’ */ } CMU SCS 15-415

  45. Overview • ... • B – trees • Dfn, Search, insertion, deletion • ... CMU SCS 15-415

  46. Deletion Rough outline of algo: • Delete key; • on underflow, may need to merge In practice, some implementors just allow underflows to happen… CMU SCS 15-415

  47. 6 9 <6 >9 <9 >6 3 1 7 13 B-trees – Deletion Easiest case: Tree T0; delete ‘3’ CMU SCS 15-415

  48. 6 9 <6 >9 <9 >6 1 7 13 B-trees – Deletion Easiest case: Tree T0; delete ‘3’ CMU SCS 15-415

  49. B-trees – Deletion • Case1: delete a key at a leaf – no underflow • Case2: delete non-leaf key – no underflow • Case3: delete leaf-key; underflow, and ‘rich sibling’ • Case4: delete leaf-key; underflow, and ‘poor sibling’ CMU SCS 15-415

  50. 6 9 <6 >9 <9 >6 3 1 7 13 B-trees – Deletion • Case1: delete a key at a leaf – no underflow (delete 3 from T0) CMU SCS 15-415

More Related