1 / 85

Data Organization - B-trees

Data Organization - B-trees. A simple index. Index file. Brighton A-217 700 Downtown A-101 500 Downtown A-110 600 Mianus A-215 700 Perry A-102 400. A-101 A-102 A-110 A-215 A-217. Index of depositors on acct_no.

feo
Download Presentation

Data Organization - B-trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Organization - B-trees

  2. A simple index Index file Brighton A-217 700 Downtown A-101 500 Downtown A-110 600 Mianus A-215 700 Perry A-102 400 ...... A-101 A-102 A-110 A-215 A-217 ...... Index of depositors on acct_no Index records: <search key value, pointer (block, offset or slot#)> To answer a query for “acct_no=A-110” we: 1. Do a binary search on index file, searching for A-110 2. “Chase” pointer of index record

  3. Index Choices 1. Primary: index search key = physical order search key vs Secondary: all other indexes Q: how many primary indices per relation? 2. Dense: index entry for every search key value vs Sparse: some search key values not in the index 3. Single level vs Multilevel (index on the indices)

  4. Measuring ‘goodness’ On what basis do we compare different indices? 1. Access type: what type of queries can be answered: • selection queries (ssn = 123)? • range queries ( 100 <= ssn <= 200)? 2. Access time: what is the cost of evaluating queries • Measured in # of block accesses 3. Maintenance overhead: cost of insertion / deletion? (also BA’s) 4. Space overhead : in # of blocks needed to store the index

  5. Indexing Primary (or clustering) index on SSN

  6. Indexing Secondary (or non-clustering) index: duplicates may exist • Can have many secondary indices • but only one primary index Address-index

  7. Indexing secondary index: typically, with ‘postings lists’ Postings lists

  8. Indexing Primary/sparse index on ssn (primary key) >=123 >=456

  9. Indexing Secondary / dense index Secondary on a candidate key: No duplicates, no need for posting lists

  10. Summary • All combinations are possible • at most one sparse/clustering index • as many as desired dense indices • usually: one primary index (probably sparse) and a few secondary indices (non-clustering)

  11. >=123 >=456 block ISAM What if index is too large to search in memory? 2nd level sparse index on the values of the 1st level

  12. >=123 >=456 124; peterson; fifth ave. ISAM - observations What about insertions/deletions?

  13. ISAM - observations What about insertions/deletions? overflows 124; peterson; fifth ave. Problems?

  14. ISAM - observations • What about insertions/deletions? overflows 124; peterson; fifth ave. • overflow chains may become very long - what to do?

  15. ISAM - observations • What about insertions/deletions? overflows 124; peterson; fifth ave. • overflow chains may become very long - thus: • shut-down & reorganize • start with ~80% utilization

  16. So far • … indices (like ISAM) suffer in the presence of frequent updates • alternative indexing structure: B - trees

  17. Overview • primary / secondary indices • multilevel (ISAM) • B - trees, B+ - trees • hashing • static hashing • dynamic hashing

  18. B-trees • the most successful family of index schemes (B-trees, B+-trees, B*-trees) • Can be used for primary/secondary, clustering/non-clustering index. • balanced “n-way” search trees

  19. 6 9 <6 >9 <9 >6 3 1 7 13 B-trees Eg., B-tree of order 3:

  20. B-tree Nodes pn p1 … vn-1 v1 v2 Vn-1 <= v v1 <= v < v2 v<v1 Key values are ordered MAXIMUM: n pointer values MINIMUM: n/2 pointer values (Exception: root’s minimum = 2)

  21. Properties • “block aware” nodes: each node -> disk page • O(logB (N)) for everything! (ins/del/search) • typically, if m = 50 - 100, then 2 - 3 levels • utilization >= 50%, guaranteed; on average 69%

  22. 6 9 <6 >9 <9 >6 3 1 7 13 Queries • Algorithm for exact match query? (eg., ssn=8?)

  23. Queries • Algorithm for exact match query? (eg., ssn=8?) 6 9 <6 >9 <9 >6 3 1 7 13

  24. Queries • Algo for exact match query? (eg., ssn=8?) 6 9 <6 >9 <9 >6 3 1 7 13

  25. Queries • Algo for exact match query? (eg., ssn=8?) 6 9 <6 >9 <9 >6 3 1 7 13

  26. Queries • Algo for exact match query? (eg., ssn=8?) 6 9 <6 H steps (= disk accesses) >9 <9 >6 3 1 7 13

  27. Queries • what about range queries? (eg., 5<salary<8) • Proximity/ nearest neighbor searches? (eg., salary ~ 8 )

  28. 6 9 <6 >9 <9 >6 3 1 7 13 Queries • what about range queries? (eg., 5<salary<8) • Proximity/ nearest neighbor searches? (eg., salary ~ 8 )

  29. 6 9 <6 >9 <9 >6 3 1 7 13 Queries • what about range queries? (eg., 5<salary<8) • Proximity/ nearest neighbor searches? (eg., salary ~ 8 )

  30. B-trees: Insertion • Insert in leaf; on overflow, push middle up (recursively) • split: preserves B - tree properties

  31. 6 9 <6 >9 <9 >6 3 1 7 13 B-trees Easy case: Tree T0; insert ‘8’

  32. 6 9 <6 >9 <9 >6 3 1 7 13 B-trees Tree T0; insert ‘8’ 8

  33. 6 9 <6 >9 <9 >6 3 1 7 13 B-trees Hardest case: Tree T0; insert ‘2’ 2

  34. 13 B-trees Hardest case: Tree T0; insert ‘2’ 6 9 2 1 3 7 push middle up

  35. 2 13 1 3 B-trees Hardest case: Tree T0; insert ‘2’ Ovf; push middle 2 6 9 7

  36. 13 1 3 B-trees Hardest case: Tree T0; insert ‘2’ 6 Final state 9 2 7

  37. B-trees - insertion • Q: What if there are two middles? (eg, order 4) • A: either one is fine

  38. B-trees: Insertion • Insert in leaf; on overflow, push middle up (recursively – ‘propagate split’) • split: preserves all B - tree properties (!!) • notice how it grows: height increases when root overflows & splits • Automatic, incremental re-organization (contrast with ISAM!)

  39. Pseudo-code INSERTION OF KEY ’K’ find the correct leaf node ’L’; if ( ’L’ overflows ){ split ’L’, by pushing the middle key upstairs to parent node ’P’; if (’P’ overflows){ repeat the split recursively; } else{ add the key ’K’ in node ’L’; /* maintaining the key order in ’L’ */ }

  40. Overview • primary / secondary indices • multilevel (ISAM) • B – trees • Dfn, Search, insertion, deletion • B+ - trees • hashing

  41. Deletion Rough outline of algo: • Delete key; • on underflow, may need to merge In practice, some implementers just allow underflows to happen…

  42. 6 9 <6 >9 <9 >6 3 1 7 13 B-trees – Deletion Easiest case: Tree T0; delete ‘3’

  43. 6 9 <6 >9 <9 >6 1 7 13 B-trees – Deletion Easiest case: Tree T0; delete ‘3’

  44. B-trees – Deletion • Case1: delete a key at a leaf – no underflow • Case2: delete non-leaf key – no underflow • Case3: delete leaf-key; underflow, and ‘rich sibling’ • Case4: delete leaf-key; underflow, and ‘poor sibling’

  45. 6 9 <6 >9 <9 >6 3 1 7 13 B-trees – Deletion • Case1: delete a key at a leaf – no underflow (delete 3 from T0)

  46. 6 9 <6 >9 <9 >6 3 1 7 13 B-trees – Deletion • Case2: delete a key at a non-leaf – no underflow (eg., delete 6 from T0) Delete & promote, ie:

  47. 9 <6 >9 <9 >6 3 1 7 13 B-trees – Deletion • Case2: delete a key at a non-leaf – no underflow (eg., delete 6 from T0) Delete & promote, ie:

  48. 9 <6 >9 <9 >6 1 7 13 B-trees – Deletion • Case2: delete a key at a non-leaf – no underflow (eg., delete 6 from T0) Delete & promote, ie: 3

  49. B-trees – Deletion • Case2: delete a key at a non-leaf – no underflow (eg., delete 6 from T0) FINAL TREE 9 3 <3 >9 <9 >3 1 7 13

  50. B-trees – Deletion • Case2: delete a key at a non-leaf – no underflow (eg., delete 6 from T0) • Q: How to promote? • A: pick the largest key from the left sub-tree (or the smallest from the right sub-tree) • Observation: every deletion eventually becomes a deletion of a leaf key

More Related