1 / 44

B trees

Explore the differences between B-Trees and B+ Trees, analyzing the impact of varying node degrees and the structure on search, insertions, and deletions. Understand performance implications and key design considerations.

hayesf
Download Presentation

B trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. B trees • Nodes have more than 2 children • Each internal node has between k and 2k children and between k-1 and 2k-1 keys • A leaf has between k-1 and 2k-1 keys • The root has at least 2 children • All leaves are at the same distance from the root

  2. 2-4 tree and General k • k=2 • Each node has 2,3,or 4 children • WHAT IS BETTER: k =2 or k >> 2?? • Depth? • Large k better • But what about degree? • Small k better • Overall:

  3. 10 30 35 A 4-node 35 ≤ key key < 10 10 ≤ key < 30 30 ≤ key < 35

  4. B vs. B+ • In a B tree items are in every node • In B+ tree items are at the leaves; internal nodes have keys to direct the search • The leaves are (possibly) also maintained in a linked list to allow fast sequential access

  5. A 2-4+ tree 10 4 7 9 15 30 1 3 30 40 50 10 5 7 9 16 17

  6. The height • The root has at least 2 children • At level 2 we have at least 2k nodes • At level 3 we have at least 2k2 nodes • At level h we have at least 2kh-1 nodes

  7. Red-Black Trees • n = 230 = 109(approx). • 30 <= height <= 60. • When the red-black tree resides on a disk, up to 60 disk access are made for a search. • Disk access takes about 5 millisecond (10-4 sec) • Memory access takes about 100 nano (10-7 sec)

  8. B-trees • B-trees are used when the tree resides in secondary storage. • k is picked according to the size of a disk block • Since the height is smaller we do less I/O, we get more in each single access

  9. B-Trees • Large degree B-trees are used to represent very large dictionaries that reside on disk. • Smaller degree B-trees used for internal-memory dictionaries to overcome cache-miss penalties.

  10. Node’s structure j a0 p1 a1 p2 a2 …pj aj k ≤ j ≤ 2k • aiis a pointer to a subtree. • piis a key Can search linearly each node. total time ≈ kh ≈ klogkn time Can maintain a little red-black tree or an array in each node so search takes ≈ log2k h ≈ log2n

  11. Insert 14 5 9 15 30 1 3 30 40 50 14 5 9 16 17 Insert(2,T).

  12. Insert 14 5 9 15 30 1 2 3 30 40 50 14 5 9 16 17 Insert(2,T).

  13. Insert 14 5 9 15 30 1 2 3 30 40 50 14 5 9 16 17 Insert(4,T).

  14. Insert 14 5 9 15 30 1 2 3 4 30 40 50 14 5 9 16 17 Insert(4,T).

  15. Split 14 5 9 15 30 1 2 3 4 30 40 50 14 5 9 16 17 Insert(4,T).

  16. Split 14 5 9 15 30 30 40 50 14 1 2 3 4 5 9 16 17 Insert(4,T).

  17. Split 14 3 5 9 15 30 30 40 50 14 1 2 3 4 5 9 16 17 Insert(4,T).

  18. 14 3 5 9 15 30 30 40 50 14 1 2 3 4 5 9 16 17 Insert(6,T).

  19. 14 3 5 9 15 30 30 40 50 14 1 2 3 4 5 6 9 16 17 Insert(6,T).

  20. 14 3 5 9 15 30 30 40 50 14 1 2 3 4 5 6 9 16 17 Insert(7,T).

  21. 14 3 5 9 15 30 30 40 50 14 1 2 3 4 9 16 17 5 6 7 Insert(7,T).

  22. 14 3 5 9 15 30 30 40 50 14 1 2 3 4 9 16 17 5 6 7 Insert(8,T).

  23. 14 3 5 9 15 30 30 40 50 14 1 2 3 4 9 16 17 5 6 7 8 Insert(8,T).

  24. Split 14 3 5 9 15 30 30 40 50 14 1 2 3 4 7 8 9 16 17 5 6 Insert(8,T).

  25. Split 14 3 5 7 9 15 30 30 40 50 14 1 2 3 4 7 8 9 16 17 5 6 Insert(8,T).

  26. Split 14 3 5 7 9 15 30 30 40 50 14 1 2 3 4 7 8 9 16 17 5 6 Insert(8,T).

  27. Split 5 14 3 7 9 15 30 30 40 50 14 1 2 3 4 7 8 9 16 17 5 6 Insert(8,T).

  28. Insert -- definition Add the new key in its position. Say in a node v. (*) If v has 4 keys split v into a 2-node u, a 1-node w, and a key k, (or two 2-nodes and a key if v is a leaf) If v was the root then create a new root r parent of u and w and stop. Replace v by u and w as children of p(v). Repeat (*) for v := p(v).

  29. Split (2k) a0 p1 a1 p2 a2 …p2k a2k (k-1) a0 p1 a1 p2 a2 …pk-1 ak-1 (k) ak pk+1 ak+1 …p2k a2k • pkis inserted in parent.

  30. Split (2k) a0 p1 a1 p2 a2 …p2k a2k Takes O(k) time (k-1) a0 p1 a1 p2 a2 …pk-1 ak-1 (k) ak pk ak+1 …p2k a2k • pkis inserted in parent.

  31. Insert (summary) • O(logn) time and at most O(logkn) each split takes O(k) time • Can show that the amortized # of splits is O(1) per insert

  32. Delete 5 14 3 7 9 15 30 30 40 50 14 1 2 3 4 7 8 9 16 17 5 6 delete(14,T).

  33. Delete 5 14 3 7 9 30 30 40 50 1 2 3 4 7 8 9 16 17 5 6 delete(14,T).

  34. Delete 5 14 3 7 9 30 30 40 50 1 2 3 4 7 8 9 16 17 5 6 delete(17,T).

  35. Delete 5 14 3 7 9 30 30 40 50 1 2 3 4 7 8 9 16 5 6 delete(17,T).

  36. Delete 5 14 3 7 9 30 30 40 50 1 2 3 4 7 8 9 16 5 6 delete(16,T).

  37. Delete 5 14 3 7 9 30 30 40 50 1 2 3 4 7 8 9 5 6 delete(16,T).

  38. Borrow 5 14 3 7 9 30 30 40 50 1 2 3 4 7 8 9 5 6 delete(16,T).

  39. Borrow 5 9 7 3 30 30 40 50 1 2 3 4 7 8 9 5 6 delete(16,T).

  40. 5 9 7 3 30 30 40 50 1 2 3 4 7 8 9 5 6 delete(9,T).

  41. 5 9 7 3 30 30 40 50 1 2 3 4 7 8 5 6 delete(9,T).

  42. Fusion 5 9 3 7 30 30 40 50 1 2 3 4 7 8 5 6 delete(9,T).

  43. Fusion 5 3 7 30 30 40 50 1 2 3 4 7 8 5 6 delete(9,T).

  44. Delete -- definition Remove the key. If it is the only key in the node remove the node, and let v be the parent that loses a child, otherwise return (*) If v has one child, and v is the root discard v. Otherwise (v is not a root), if v has a sibling w of degree 3 or 4, borrow a child from w to v and terminate. Otherwise, fuse v with its sibling to a degree 3 node and repeat (*) with the parent of v.

More Related