1 / 76

B-trees

B-trees. Eduardo Laber David Sotelo. What are B-trees ?. Balanced search trees designed for secondary storage devices Similar to AVL-trees but better at minimizing disk I/O operations Main data structure used by DBMS to store and retrieve information. What are B-trees ?.

emmett
Download Presentation

B-trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. B-trees Eduardo Laber David Sotelo

  2. What are B-trees? • Balanced search treesdesigned for secondarystoragedevices • Similar to AVL-treesbutbetter at minimizing disk I/O operations • Main data structureusedby DBMS to store and retrieveinformation

  3. What are B-trees? • Nodesmayhavemanychildren (from a few to thousands) • Branchingfactorcanbe quite large • EveryB-tree of n keyshasheight O(log n) • In practice, its height is smallerthantheheight of anAVL-Tree

  4. B-trees and BranchingFactor

  5. Definition of B-trees B-tree is a rootedtreecontainingthefollowingfiveproperties: • Everynode x hasthefollowingattributes: • x.n, thenumber of keysstored in node x • Thex.n keys: x.key1 ≤ x.key2≤ ... ≤ x.keyx.n • Theboolenx.leafindicatingif x is a leaforaninternalnode

  6. Definition of B-trees • If x is aninternalnode it containsx.n + 1 pointers x.p1 , x.p2, ... , x.p(x.n + 1) to its children • Thekeysx.keyiseparate ranges of treesstored in eachsubtree(x.pi , x.pi+1 ) • Allleaveshavethesamedepth == tree’sheight.

  7. Definition of B-trees • Bounds on thenumber of keys of a node: • LetB be a positive integerrepresentingtheorder of theB-tree. • Everynode (excepttheroot) musthaveat leastBkeys. • Everynode (excepttheroot) musthaveat most2Bkeys. • Root is free to containbetween1 and 2Bnodes (why?)

  8. Example of a B-tree

  9. Exercise 1 EnumerateallvalidB-trees of order 2 thatrepresentthe set {1, 2, ... , 8}

  10. Exercise 1 Solution: 4 5 1 2 3 5 6 7 8 1 2 3 4 6 7 8 3 6 1 2 4 5 7 8

  11. Theheight of a B-tree Theorem: Lethbetheheight of a B-tree of nkeys and orderB > 1. Then:h ≤ logB (n+1)/2 Proof: • Rootcontains at leastonekey. • Allothernodescontain at least B keys • At leastonekey at depth 0 • At least 2B keys at depth 1 • At least 2B2 + B keys at depth 2 • At least 2Bi+ Bi-1 + Bi-2 + ... + B keys at depth i

  12. Proof (continued)

  13. Searching a B-tree • Similar to searching a binary search tree. • Multiwaybranchingdecisionaccording to thenumber of thenode’schidren. • Recursiveprocedurewith a time complexity of O(B logB n) for a tree of nkeys and orderB.

  14. Searching a B-tree B-TREE-SEARCH (x, k) • i = 1 • while i ≤ x.n and k > x.keyido i = i + 1 • if i ≤ x.n and k == x.keyithenreturn (x, i) • ifx.leafthenreturnNIL • else DISK-READ(x.pi) returnB-TREE-SEARCH (x.pi, k)

  15. Searching a B-tree • Search for thekeyF J K P S C G D E F H I L M Q R T U A B N O

  16. Searching a B-tree • Search for thekeyF J K P S C G D E F H I L M Q R T U A B N O

  17. Searching a B-tree • Search for thekeyF J K P S C G D E F H I L M Q R T U A B N O

  18. Searching a B-tree • Search for thekeyN J KP S C G D E F H I L M Q R T U A B N O

  19. Searching a B-tree Lemma: The time complexity of procedure B-TREE-SEARCH is O(B logB n) Proof: • Number of recursivecalls is equal to tree’sheight. • Theheight of a B-tree is O(logB n) • CostbetweenB and 2Biterations per call. • Total of O(B logB n) steps. ■

  20. Exercise 2 • Supposethat B-TREE-SEARCH is implemented to use binary search ratherthan linear search withineachnode. • Show thatthischangesmakesthe time complexity O(lg n), independently of howBmightbechosen as a function of n.

  21. Exercise 2 Solution: • Byusingbinary search thenumber of steps of thealgorithmbecomesO(lg B logB n) . • Observe thatlogB n = lg n / lg B . • ThereforeO(lg B logB n) = O(lg n).

  22. Linear orBinaryB-tree search ? Lemma: If 1 < B < n thenlg n≤B logB n Proof:

  23. Inserting a key into a B-tree • The new key is always inserted into an existing leaf node (why?) • Firstly we search for the leaf position at which to insert the new key. • If such a node is full we split it. • A split operation splits a full node around its median key into two nodes having B keys each. • Median key moves up into splitted node’s parent (insertionrecursivecall).

  24. Split operation • Inserting key F into a full node (B = 2) J A C E G K M O Q

  25. Split operation • Node found but already full J A C E F G K M O Q

  26. Split operation • Median key identified J A C E F G K M O Q

  27. Split operation • Splitting the node E J A C F G K M O Q

  28. Inserting a key into a B-tree • Insertion can be propagated upward (B = 2) E J T X Y Z U W A C F G K M O Q

  29. Inserting a key into a B-tree • Insertion can be propagated upward (B = 2) E J T X Y Z U W A C F G K M N O Q

  30. Inserting a key into a B-tree • Insertion can be propagated upward (B = 2) E J N T X A C F G K M O Q U W Y Z SPLIT

  31. Inserting a key into a B-tree • Insertion can be propagated upward (B = 2) N SPLIT E J T X A C F G K M O Q U W Y Z

  32. Inserting a key into a B-tree B-TREE-INSERT (x, k, y) • i = 1 • while i ≤ x.n and k < x.keyido i = i + 1 • x.n = x.n + 1 • x.keyi = k • x.pi+1 = y • for j = x.n downto i+1 do • x.keyj = x.keyj-1 • x.pj = x.pj-1 • end-for • DISK-WRITE(x)

  33. Inserting a key into a B-tree B-TREE-INSERT (x, k) • if x.n > 2*B then • [m, z] = SPLIT (x) • ifx.parent != NIL then • DISK-READ (x.parent) • end-if • else • x.parent = ALLOCATE-NODE() • DISK-WRITE (x) • root = x.parent • end-else • B-TREE-INSERT (x.parent, m, z) • end-if

  34. Inserting a key into a B-tree SPLIT (x) • z = ALLOCATE-NODE() • m = FIND-MEDIAN (x) • COPY-GREATER-ELEMENTS(x, m, z) • DISK-WRITE (z) • COPY-SMALLER-ELEMENTS(x, m, x) • DISK-WRITE (x) • return [m, z]

  35. Inserting a key into a B-tree • Function B-TREE-INSERT has three arguments: • The node x at which an element of key k should be inserted • The key k to be inserted • A pointer y to the left child of k to be used as one of the pointers of x during insertion process. • There is a global variable named root which is a pointer to the root of the B-Tree. • Observe that the field x.parent was not defined as an original B-tree attribute, but is considered just to simplify the process. • The fields x.leaf should also be updated accordingly.

  36. Inserting a key into a B-tree Lemma: The time complexity of B-TREE-INSERTis O(B logB n) Proof: • Recall that B-TREE-SEARCH function is calledfirst and costs O(log n) byusingbinary search. Then, B-TREE-INSERT starts byvisiting a node and proceedsupward. • At mostonenode is visited per level/depth and onlyvisitednodescanbesplitted. A mostonenode is createdduringtheinsertionprocess. Cost for splitting is proportional to 2B. • Number of visitednodes is equal to tree’sheight and theheight of a B-tree is O(logB n). CostbetweenB and 2Biterations per visitednode. Total of O(B logB n) steps. ■

  37. Some questions on insertion • Whichsplitoperationincreasesthetree’sheight? Thesplit of theroot of thetree. • HowmanyDISK-READoperations are executedbytheinsertionalgorithm? Everynodewas read at leasttwice. • Does binary search makesensehere? Notexactly. Wealreadypay O(B) to split a node (for findingthemedian).

  38. Drawbacks of ourinsertionmethod • Oncethatthekey’sinsertionnode is found it maybenecessary to read its parentnodeagain (due to splitting). • DISK-READ/WRITEoperations are expensive and wouldbeexecutedalleasttwice for eachnode in thekey’s path. • It wouldbenecessary to store a nodes’sparentor to use therecursionstack to keep its reference. • (Mond and Raz, 1985)provide a solution thatspendsoneDISK-READ/WRITE per visitednode (See at CLRS)

  39. Exercise 3 • Show theresults of insertingthekeys E, H, B, A, F, G, C, J, D, I in orderintoanemptyB-tree of order 1.

  40. Exercise 3 Solution: (final configuration) E B G I A C D F H J

  41. Exercise 4 • Does a B-tree of order 1 is a goodchoice for a balanced search tree? • Whatabouttheexpressionh ≤ logB (n+1)/2when B = 1?

  42. Deleting a key from a B-tree • Analogous to insertionbut a little more complicated. • A keycanbedeleted from anynode (notjust a leaf) and canaffect its parent and its children (insertionoperationjustaffectparents). • Onemustensurethat a node does notget to small duringdeletion (lessthanBkeys). • As a resultdeleting a node is themostcomplexoperation on B-trees. It willbeconsidered in 4 particular cases.

  43. Deleting a key from a B-tree • Case 1: The key is in a leaf node with more thanB elements. Procedure: • Just remove thekey from thenode.

  44. Deleting a key from a B-tree • Case 1: The key is in a leaf node with more than B elements (B = 2) N E J T X A C D F G K M O Q U W Y Z

  45. Deleting a key from a B-tree • Case 1: The key is in a leaf node with more than B elements (B = 2) N E J T X A D F G K M O Q U W Y Z

  46. Deleting a key from a B-tree Case 2: Thejoinprocedure • The key k1 to be deleted is in a leaf x with exactly B elements. • Let y be a node that is an “adjacent brother” of x. • Suppose that y has exactly B elements. Procedure: • Remove thekeyk1. • Let k2bethekeythatseparatesnodesx and y in theirparent. • Jointhethenodesx and y and move thekeyk2 from theparent to thenewjoinednode. • Iftheparent of x becomeswithB-1 elements and alsohasan “adjacent brother” withBelements, applythejoinprocedurerecursively for theparent of x (seen as x) and its adjacent brother (seen as y).

  47. Deleting a key from a B-tree • Case 2: Delete key Q (B = 2) F K T X ... H I O Q U W Y Z

  48. Deleting a key from a B-tree • Case 2: Delete key Q (B = 2) F Parent K T X ... H I O Q U W Y Z Node x Node y

  49. Deleting a key from a B-tree • Case 2: Delete key Q (B = 2) F Parent K T X ... H I O U W Y Z Node x Node y

  50. Deleting a key from a B-tree • Case 2: Delete key Q (B = 2) F Parent K T X ... H I O U W Y Z Node x Node y

More Related