1 / 23

Principles of Database Management Systems 4.1: B-Trees

Principles of Database Management Systems 4.1: B-Trees. Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina, Jeff Ullman and Jennifer Widom). B-Trees a commonly used index structure nonsequential, “balanced” (acces paths to different records of equal length)

alayna
Download Presentation

Principles of Database Management Systems 4.1: B-Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Principles of Database Management Systems4.1: B-Trees Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina, Jeff Ullman and Jennifer Widom) Notes 4.1: B-Trees

  2. B-Trees • a commonly used index structure • nonsequential, “balanced” (acces paths to different records of equal length) • adapts well to insertions & deletions • consists of blocks holding at most n keys and n+1 pointers, and at least half of this • (We consider a variation actually called a B+ tree) Notes 4.1: B-Trees

  3. B+Tree Example n=3 100 Root 120 150 180 30 3 5 11 120 130 180 200 100 101 110 150 156 179 30 35 Notes 4.1: B-Trees

  4. Sample non-leaf 120 150 180 to keys to keys to keys to keys < 120 120 k<150 150k<180 180 Notes 4.1: B-Trees

  5. Sample leaf node: From non-leaf node to next leaf in sequence 120 130 unused To record with key 120 To record with key 130 Notes 4.1: B-Trees

  6. Don’t want nodes to be too empty • Number of pointers in use: • at internal nodes at least (n+1)/2 (to child nodes) • at leaves at least (n+1)/2 (to data records/blocks) Notes 4.1: B-Trees

  7. n=3 Full node min. node Non-leaf Leaf 120 150 180 30 3 5 11 30 35 Notes 4.1: B-Trees

  8. B+tree rules (1) All leaves at the same lowest level (balanced tree) (2) Pointers in leaves point to records except for “sequence pointer” Notes 4.1: B-Trees

  9. (3) Number of pointers/keys for B+tree Max Max Min Min ptrs keys ptrsdata keys Non-leaf (non-root) n+1 n (n+1)/2 (n+1)/2- 1 Leaf (non-root) n+1 n (n+1)/2 (n+1)/2 Root n+1 n 2(*) 1 (*) 1, if only one record in the file Notes 4.1: B-Trees

  10. Insert into B+tree First lookup the proper leaf; (a) simple case • leaf not full: just insert (key, pointer-to-record) (b) leaf overflow (c) non-leaf overflow (d) new root Notes 4.1: B-Trees

  11. 32 n=3 100 (a) Insert key = 32 30 3 5 11 30 31 Notes 4.1: B-Trees

  12. 7 3 5 7 n=3 100 (b) Insert key = 7 30 3 5 11 30 31 Notes 4.1: B-Trees

  13. 160 180 160 179 n=3 100 (c) Insert key = 160 120 150 180 180 200 150 156 179 Notes 4.1: B-Trees

  14. 30 new root 40 40 45 n=3 Height grows at root => balance maintained (d) New root, insert 45 10 20 30 1 2 3 10 12 20 25 30 32 40 Notes 4.1: B-Trees

  15. Deletion from B+tree Again, first lookup the proper leaf; (a): Simple case: no underflow; Otherwise ... (b): Borrow keys from an adjacent sibling (if it doesn't become too empty); Else ... (c): Coalesce with a sibling node -> (d): Cases (a), (b) or (c) at non-leaf Notes 4.1: B-Trees

  16. 35 35 => min # of keys in a leaf = 5/2 = 2 n=4 (b) Borrow keys • Delete 50 10 40 100 10 20 30 35 40 50 Notes 4.1: B-Trees

  17. 40 n=4 (c) Coalesce with a sibling • Delete 50 20 40 100 20 30 40 50 Notes 4.1: B-Trees

  18. new root 40 25 30 => min # of keys in a non-leaf = (n+1)/2 - 1=3-1= 2 (d) Non-leaf coalesce • Delete 37 n=4 25 10 20 30 40 25 26 30 37 1 3 10 14 20 22 40 45 Notes 4.1: B-Trees

  19. B+tree deletions in practice • Often, coalescing is not implemented • Too hard and not worth it! • later insertions may return the node back to its required minimum size • Compromise: Try redistributing keys with a sibling;If not possible, leave it there • if all accesses to the records go through the B-tree, can place a "tombstone" for the deleted record at the leaf Notes 4.1: B-Trees

  20. Why B-trees Are Good? • B-tree adapts well to insertions and deletions, maintaining balance • DBA does not need to care about reorganizing • split/merge operations rather rare (How often would nodes with 200 keys be split?) -> Access times dominated by key-lookup (i.e., traversal from root to a leaf) Notes 4.1: B-Trees

  21. Efficiency of B-trees • For example, assume 4 KB blocks, 4 byte keys and 8 byte pointers • How many keys and pointers fit in a node (= index block)? • Max n s.t. (4*n + 8*(n+1)) B  4096 B ?-> n=340; 340 keys and 341 pointers fit in a node-> 171 … 341 pointers in a non-leaf node Notes 4.1: B-Trees

  22. Efficiency of B-trees (cont.) • Assume an average node has 255 pointers-> a three-level B-tree has 2552 = 65025 leaves with total of 2553 or about 16.6 million pointers to records-> if root block kept in main memory,each record can be accessed with 2+1 disk I/Os; If all 256 internal nodes are in main memory, record access requires 1+1 disk I/Os (256 x 4 KB = 1 MB; quite feasible!) Notes 4.1: B-Trees

  23. Outline/summary • B trees • popular index structures with graceful growth properties • support range queries like “… WHERE 100 < Key < 200” (-> Exercises) • Next: Hashing schemes Notes 4.1: B-Trees

More Related