770 likes | 1.09k Views
B-trees. Eduardo Laber David Sotelo. What are B-trees ?. Balanced search trees designed for secondary storage devices Similar to AVL-trees but better at minimizing disk I/O operations Main data structure used by DBMS to store and retrieve information. What are B-trees ?.
E N D
B-trees Eduardo Laber David Sotelo
What are B-trees? • Balanced search treesdesigned for secondarystoragedevices • Similar to AVL-treesbutbetter at minimizing disk I/O operations • Main data structureusedby DBMS to store and retrieveinformation
What are B-trees? • Nodesmayhavemanychildren (from a few to thousands) • Branchingfactorcanbe quite large • EveryB-tree of n keyshasheight O(log n) • In practice, its height is smallerthantheheight of anAVL-Tree
Definition of B-trees B-tree is a rootedtreecontainingthefollowingfiveproperties: • Everynode x hasthefollowingattributes: • x.n, thenumber of keysstored in node x • Thex.n keys: x.key1 ≤ x.key2≤ ... ≤ x.keyx.n • Theboolenx.leafindicatingif x is a leaforaninternalnode
Definition of B-trees • If x is aninternalnode it containsx.n + 1 pointers x.p1 , x.p2, ... , x.p(x.n + 1) to its children • Thekeysx.keyiseparate ranges of treesstored in eachsubtree(x.pi , x.pi+1 ) • Allleaveshavethesamedepth == tree’sheight.
Definition of B-trees • Bounds on thenumber of keys of a node: • LetB be a positive integerrepresentingtheorder of theB-tree. • Everynode (excepttheroot) musthaveat leastBkeys. • Everynode (excepttheroot) musthaveat most2Bkeys. • Root is free to containbetween1 and 2Bnodes (why?)
Exercise 1 EnumerateallvalidB-trees of order 2 thatrepresentthe set {1, 2, ... , 8}
Exercise 1 Solution: 4 5 1 2 3 5 6 7 8 1 2 3 4 6 7 8 3 6 1 2 4 5 7 8
Theheight of a B-tree Theorem: Lethbetheheight of a B-tree of nkeys and orderB > 1. Then:h ≤ logB (n+1)/2 Proof: • Rootcontains at leastonekey. • Allothernodescontain at least B keys • At leastonekey at depth 0 • At least 2B keys at depth 1 • At least 2B2 + B keys at depth 2 • At least 2Bi+ Bi-1 + Bi-2 + ... + B keys at depth i
Searching a B-tree • Similar to searching a binary search tree. • Multiwaybranchingdecisionaccording to thenumber of thenode’schidren. • Recursiveprocedurewith a time complexity of O(B logB n) for a tree of nkeys and orderB.
Searching a B-tree B-TREE-SEARCH (x, k) • i = 1 • while i ≤ x.n and k > x.keyido i = i + 1 • if i ≤ x.n and k == x.keyithenreturn (x, i) • ifx.leafthenreturnNIL • else DISK-READ(x.pi) returnB-TREE-SEARCH (x.pi, k)
Searching a B-tree • Search for thekeyF J K P S C G D E F H I L M Q R T U A B N O
Searching a B-tree • Search for thekeyF J K P S C G D E F H I L M Q R T U A B N O
Searching a B-tree • Search for thekeyF J K P S C G D E F H I L M Q R T U A B N O
Searching a B-tree • Search for thekeyN J KP S C G D E F H I L M Q R T U A B N O
Searching a B-tree Lemma: The time complexity of procedure B-TREE-SEARCH is O(B logB n) Proof: • Number of recursivecalls is equal to tree’sheight. • Theheight of a B-tree is O(logB n) • CostbetweenB and 2Biterations per call. • Total of O(B logB n) steps. ■
Exercise 2 • Supposethat B-TREE-SEARCH is implemented to use binary search ratherthan linear search withineachnode. • Show thatthischangesmakesthe time complexity O(lg n), independently of howBmightbechosen as a function of n.
Exercise 2 Solution: • Byusingbinary search thenumber of steps of thealgorithmbecomesO(lg B logB n) . • Observe thatlogB n = lg n / lg B . • ThereforeO(lg B logB n) = O(lg n).
Linear orBinaryB-tree search ? Lemma: If 1 < B < n thenlg n≤B logB n Proof:
Inserting a key into a B-tree • The new key is always inserted into an existing leaf node (why?) • Firstly we search for the leaf position at which to insert the new key. • If such a node is full we split it. • A split operation splits a full node around its median key into two nodes having B keys each. • Median key moves up into splitted node’s parent (insertionrecursivecall).
Split operation • Inserting key F into a full node (B = 2) J A C E G K M O Q
Split operation • Node found but already full J A C E F G K M O Q
Split operation • Median key identified J A C E F G K M O Q
Split operation • Splitting the node E J A C F G K M O Q
Inserting a key into a B-tree • Insertion can be propagated upward (B = 2) E J T X Y Z U W A C F G K M O Q
Inserting a key into a B-tree • Insertion can be propagated upward (B = 2) E J T X Y Z U W A C F G K M N O Q
Inserting a key into a B-tree • Insertion can be propagated upward (B = 2) E J N T X A C F G K M O Q U W Y Z SPLIT
Inserting a key into a B-tree • Insertion can be propagated upward (B = 2) N SPLIT E J T X A C F G K M O Q U W Y Z
Inserting a key into a B-tree B-TREE-INSERT (x, k, y) • i = 1 • while i ≤ x.n and k < x.keyido i = i + 1 • x.n = x.n + 1 • x.keyi = k • x.pi+1 = y • for j = x.n downto i+1 do • x.keyj = x.keyj-1 • x.pj = x.pj-1 • end-for • DISK-WRITE(x)
Inserting a key into a B-tree B-TREE-INSERT (x, k) • if x.n > 2*B then • [m, z] = SPLIT (x) • ifx.parent != NIL then • DISK-READ (x.parent) • end-if • else • x.parent = ALLOCATE-NODE() • DISK-WRITE (x) • root = x.parent • end-else • B-TREE-INSERT (x.parent, m, z) • end-if
Inserting a key into a B-tree SPLIT (x) • z = ALLOCATE-NODE() • m = FIND-MEDIAN (x) • COPY-GREATER-ELEMENTS(x, m, z) • DISK-WRITE (z) • COPY-SMALLER-ELEMENTS(x, m, x) • DISK-WRITE (x) • return [m, z]
Inserting a key into a B-tree • Function B-TREE-INSERT has three arguments: • The node x at which an element of key k should be inserted • The key k to be inserted • A pointer y to the left child of k to be used as one of the pointers of x during insertion process. • There is a global variable named root which is a pointer to the root of the B-Tree. • Observe that the field x.parent was not defined as an original B-tree attribute, but is considered just to simplify the process. • The fields x.leaf should also be updated accordingly.
Inserting a key into a B-tree Lemma: The time complexity of B-TREE-INSERTis O(B logB n) Proof: • Recall that B-TREE-SEARCH function is calledfirst and costs O(log n) byusingbinary search. Then, B-TREE-INSERT starts byvisiting a node and proceedsupward. • At mostonenode is visited per level/depth and onlyvisitednodescanbesplitted. A mostonenode is createdduringtheinsertionprocess. Cost for splitting is proportional to 2B. • Number of visitednodes is equal to tree’sheight and theheight of a B-tree is O(logB n). CostbetweenB and 2Biterations per visitednode. Total of O(B logB n) steps. ■
Some questions on insertion • Whichsplitoperationincreasesthetree’sheight? Thesplit of theroot of thetree. • HowmanyDISK-READoperations are executedbytheinsertionalgorithm? Everynodewas read at leasttwice. • Does binary search makesensehere? Notexactly. Wealreadypay O(B) to split a node (for findingthemedian).
Drawbacks of ourinsertionmethod • Oncethatthekey’sinsertionnode is found it maybenecessary to read its parentnodeagain (due to splitting). • DISK-READ/WRITEoperations are expensive and wouldbeexecutedalleasttwice for eachnode in thekey’s path. • It wouldbenecessary to store a nodes’sparentor to use therecursionstack to keep its reference. • (Mond and Raz, 1985)provide a solution thatspendsoneDISK-READ/WRITE per visitednode (See at CLRS)
Exercise 3 • Show theresults of insertingthekeys E, H, B, A, F, G, C, J, D, I in orderintoanemptyB-tree of order 1.
Exercise 3 Solution: (final configuration) E B G I A C D F H J
Exercise 4 • Does a B-tree of order 1 is a goodchoice for a balanced search tree? • Whatabouttheexpressionh ≤ logB (n+1)/2when B = 1?
Deleting a key from a B-tree • Analogous to insertionbut a little more complicated. • A keycanbedeleted from anynode (notjust a leaf) and canaffect its parent and its children (insertionoperationjustaffectparents). • Onemustensurethat a node does notget to small duringdeletion (lessthanBkeys). • As a resultdeleting a node is themostcomplexoperation on B-trees. It willbeconsidered in 4 particular cases.
Deleting a key from a B-tree • Case 1: The key is in a leaf node with more thanB elements. Procedure: • Just remove thekey from thenode.
Deleting a key from a B-tree • Case 1: The key is in a leaf node with more than B elements (B = 2) N E J T X A C D F G K M O Q U W Y Z
Deleting a key from a B-tree • Case 1: The key is in a leaf node with more than B elements (B = 2) N E J T X A D F G K M O Q U W Y Z
Deleting a key from a B-tree Case 2: Thejoinprocedure • The key k1 to be deleted is in a leaf x with exactly B elements. • Let y be a node that is an “adjacent brother” of x. • Suppose that y has exactly B elements. Procedure: • Remove thekeyk1. • Let k2bethekeythatseparatesnodesx and y in theirparent. • Jointhethenodesx and y and move thekeyk2 from theparent to thenewjoinednode. • Iftheparent of x becomeswithB-1 elements and alsohasan “adjacent brother” withBelements, applythejoinprocedurerecursively for theparent of x (seen as x) and its adjacent brother (seen as y).
Deleting a key from a B-tree • Case 2: Delete key Q (B = 2) F K T X ... H I O Q U W Y Z
Deleting a key from a B-tree • Case 2: Delete key Q (B = 2) F Parent K T X ... H I O Q U W Y Z Node x Node y
Deleting a key from a B-tree • Case 2: Delete key Q (B = 2) F Parent K T X ... H I O U W Y Z Node x Node y
Deleting a key from a B-tree • Case 2: Delete key Q (B = 2) F Parent K T X ... H I O U W Y Z Node x Node y