360 likes | 368 Views
Learn about B+-trees, a modified version of B-trees, with a different structure and operations for efficient data organization and insertion.
E N D
B + -trees CM3117 Data Structures
B-Trees • The B-tree is similar to binary search trees in another way • While it is good for locating a particular key it is not good for locating a series of keys • It is possible to find the series of keys within one node when in a leaf node • It is not possible to find the next key otherwise • The B-tree can be modified to make this possible CM3117 -- Data Structures and File Organizations
B+-tree • The modified B-tree is called a B+-tree • The difference is primarily in the structure of the tree • All keys and the associated data are found in the leaf nodes • The root and interior nodes contain only keys and pointers CM3117 -- Data Structures and File Organizations
B+-tree • The keys in the root and interior nodes are duplicated in the leaf nodes • These keys end up as the first key in each node except the leftmost node • This means that the leaf nodes have a different structure from the non-leaf nodes CM3117 -- Data Structures and File Organizations
B+-trees • The last pointer of the leaf nodes are used to connect each leaf node to the next node in sequence • The leaf nodes form a linked list • They are called the sequence set • The non-leaf nodes are called the index set since they are used exclusively to locate data CM3117 -- Data Structures and File Organizations
B+-trees • Here is an example of a B+-tree Index Set 50 20 85 10 20 45 50 68 85 101 Sequence Set CM3117 -- Data Structures and File Organizations
B+-trees • The structural differences lead to differences in the operations as well • For example, the search operation in B+-trees must end in the leaf node since that is where the data is actually located • The keys in the index set serve only to direct the search to the correct leaf node CM3117 -- Data Structures and File Organizations
B+-trees • There are also differences in the insertion, deletion, and redistribution operations • Because of the structural differences between the index set and the sequence set the differences only apply to the sequence set • Structurally the index set is the same as the nodes in a B-tree, so these operations are the same for the index set as in a B-tree CM3117 -- Data Structures and File Organizations
Inserting into a B+-tree • There is no difference when there is room in the node • The difference is in the node split CM3117 -- Data Structures and File Organizations
Inserting into a B+-tree • Insert 45 into the B+-tree below • Since the node is full it needs to be split 30 10 20 30 50 CM3117 -- Data Structures and File Organizations
Inserting into a B+-tree • Create the new node and determine that the median key is 45 30 10 20 30 50 CM3117 -- Data Structures and File Organizations
Inserting into a B+-tree • Move from the median key on to the new node • The difference here is that the median key is copied 30 10 20 30 45 50 CM3117 -- Data Structures and File Organizations
Inserting into a B+-tree • Since this is a leaf node no need to copy any pointer except the last pointer in the node 30 10 20 30 45 50 CM3117 -- Data Structures and File Organizations
Inserting into a B+-tree • The last pointer in the original node is set to point to the new node 30 10 20 30 45 50 CM3117 -- Data Structures and File Organizations
Inserting into a B+-tree • The median key and the pointer to the new node is still raised to the parent 30 45 10 20 30 45 50 CM3117 -- Data Structures and File Organizations
Inserting into a B+-tree • Need to emphasize that this difference only applies to the sequence set • Suppose we insert 25 into this B+-tree (without doing redistribution) 30 45 10 20 30 45 50 CM3117 -- Data Structures and File Organizations
Inserting into a B+-tree • Create the new node, identify 20 as the median key, move the keys to the new node, and set the pointers in the original and new nodes 30 45 10 20 25 30 45 50 CM3117 -- Data Structures and File Organizations
Inserting into a B+-tree • When the median key is raised, the parent node needs to be split • This node split will follow the same rules as for a B-tree 20 30 45 10 20 25 30 45 50 CM3117 -- Data Structures and File Organizations
Inserting into a B+-tree • Create the new node, move all keys and pointers following the median key, and raise the median key 30 20 45 10 20 25 30 45 50 CM3117 -- Data Structures and File Organizations
Inserting into a B+-tree • Since this is a root split, create the new root node, insert the median key and change the pointer to the root 30 20 45 10 20 25 30 45 50 CM3117 -- Data Structures and File Organizations
Redistribution in a B+-tree • The redistributions are also slightly different in the sequence set • Suppose we insert 55 into this B+-tree 30 20 45 10 20 25 30 45 50 CM3117 -- Data Structures and File Organizations
Redistribution in a B+-tree • Can do a left redistribution 30 20 45 10 20 25 30 45 50 CM3117 -- Data Structures and File Organizations
Redistribution in a B+-tree • Copy the first key in the node to the last position in the node to the left and move all nodes down in the right node 30 20 45 10 20 25 30 45 50 CM3117 -- Data Structures and File Organizations
Redistribution in a B+-tree • Insert the new key and its data 30 20 45 10 20 25 30 45 50 55 CM3117 -- Data Structures and File Organizations
Redistribution in a B+-tree • Replace the key in the parent with the first key in the right node • Note there are no pointers to move 30 20 50 10 20 25 30 45 50 55 CM3117 -- Data Structures and File Organizations
Redistribution in a B+-tree • Right redistributions are also slightly different • Insert 12 into this B+-tree 30 20 50 10 15 20 30 45 50 55 CM3117 -- Data Structures and File Organizations
Redistribution in a B+-tree • Move the rightmost key in the left node and insert it in the right node 30 20 50 10 15 20 30 45 50 55 CM3117 -- Data Structures and File Organizations
Redistribution in a B+-tree • Copy the rightmost node (now the first key in the right node) to the parent • Insert the new key in the left node 30 15 50 10 12 15 20 30 45 50 55 CM3117 -- Data Structures and File Organizations
Redistribution in a B+-tree • As before there are no pointers to change 30 15 50 10 12 15 20 30 45 50 55 CM3117 -- Data Structures and File Organizations
Deleting from a B+-tree • As in a B-tree, when deleting redistributions must be done if possible • The node combination is different, but again only in the sequence set CM3117 -- Data Structures and File Organizations
Deleting from a B+-tree • Delete 10 from this B+-tree 32 68 89 10 25 32 51 68 80 8 CM3117 -- Data Structures and File Organizations
Deleting from a B+-tree • A node combination is necessary • But in this case it is not necessary to copy the key in the parent 32 68 89 25 32 51 68 80 8 CM3117 -- Data Structures and File Organizations
Deleting from a B+-tree • Move all keys in the right node to the left node • Note that you actually have one slot empty 32 68 89 25 32 51 68 80 8 CM3117 -- Data Structures and File Organizations
Deleting from a B+-tree • Copy the last pointer in the now empty node to the last pointer of the left node 32 68 89 25 32 51 68 80 8 CM3117 -- Data Structures and File Organizations
Deleting from a B+-tree • Remove the key in the parent node and release the empty node 68 89 25 32 51 68 80 85 92 95 9 CM3117 -- Data Structures and File Organizations
Deleting from a B+-tree • When the key that is deleted is the first in the node it is also found in the index set • Technically this should be changed in the index set • This can get to be a lot of processing • Generally just leave the index set unchanged CM3117 -- Data Structures and File Organizations