240 likes | 382 Views
B-Tree. B-Trees. a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number of subtrees of each node, then, may also be large
E N D
B-Trees • a specialized multi-way tree designed especially for use on disk • In a B-tree each node may contain a large number of keys. The number of subtrees of each node, then, may also be large • A B-tree is designed to branch out in this large number of directions and to contain a lot of keys in each node so that the height of the tree is relatively small
Definitions • A B-tree of order m (the maximum number of children for each node) is a tree which satisfies the following properties: 1. Every node has at most m children. 2. Every node (except root and leaves) has at least ceil(m⁄2) children. 3. The root has at least two children if it is not a leaf node. 4. All leaves appear in the same level, and carry information. 5. A non-leaf node with k children contains k–1 key 6. Each leaf node (other than the root node if it is a leaf) must contain at least ceil(m / 2) - 1 keys 7. Keys and subtrees are arranged in the fashion of search tree
B-Tree -- Search • Search is performed in the typical manner, analogous to that in a binary search tree. Starting at the root, the tree is traversed top to bottom, choosing the child pointer whose separation values are on either side of the value that is being searched. • Binary search is typically (but not necessarily) used within nodes to find the separation values and child tree of interest.
B-Tree Insertion • When inserting an item, first do a search for it in the B-tree. If the item is not already in the B-tree, this unsuccessful search will end at a leaf. • If there is room in this leaf, just insert the new item here. Note that this may require that some existing keys be moved one to the right to make room for the new item. • If instead this leaf node is full so that there is no room to add the new item, then the node must be "split" with about half of the keys going into a new node to the right of this one. The median (middle) key is moved up into the parent node. (Of course, if that node has no room, then it may have to be split as well.) Note that when adding to an internal node, not only might we have to move some keys one position to the right, but the associated pointers have to be moved right as well. • If the root node is ever split, the median key moves up into a new root node, thus causing the tree to increase in height by one.
Insertion Example • Insert the following letters into what is originally an empty B-tree of order 5: C N G A H E K Q M F W L T Z D P R X Y S • Order 5 means that a node can have a maximum of 5 children and 4 keys. All nodes other than the root must have a minimum of 2 keys.
Insertion Example -- continued • The first 4 letters get inserted into the same node, resulting in this picture: • Insert H (no room in above node, split it into 2 nodes, move median G up into a new root node)
Insertion Example -- continued • Insert E, K, and Q • Insert M (split the node, M is median, move up)
Insertion Example -- continued • Insert F, W, L and T • Insert Z (Split, move median T up)
Insertion Example -- continued • Insert D (Split, move median D up), then insert P, R, X, Y • Insert S (Split, move median Q up, Split, move median M up)
B-Tree Deletion • locate and delete the item, then restructure the tree to regain its invariants • There are two special cases to consider when deleting an element: 1. the element in an internal node may be a separator for its child nodes 2. deleting an element may put it under the minimum number of elements and children
B-Tree Deletion • Search for the value to delete • If the value is in an internal node, choose a new separator (either the largest element in the left subtree or the smallest element in the right subtree), remove it from the leaf node it is in, and replace the element to be deleted with the new separator (for the leaf node with an element deleted, same as case below) • If the value is in a leaf node, it can simply be deleted from the node, perhaps leaving the node with too few elements; so some additional changes to the tree will be required
B-Tree Deletion Additional changes -- Rebalancing after deletion • If the right sibling has more than the minimum number of elements • Borrow one, adjust the separator • Otherwise, if the left sibling has more than the minimum number of elements • Borrow one, adjust the separator • If both immediate siblings have only the minimum number of elements * Create a new node with all the elements from the deficient node, all the elements from one of its siblings, and the separator in the parent between the two combined sibling nodes. * Remove the separator from the parent, and replace the two children it separated with the combined node. * If that brings the number of elements in the parent under the minimum, repeat these steps with that deficient node, unless it is the root, since the root may be deficient.
B-Tree Deletion Example Delete H
Deletion Example -- Continued • Delete T (internal node, select the smallest element from the right subtree to replace T)
Deletion Example -- Continued • Delete R (leaf node, need rebalance after the deletion: • Borrow a key from right sibling, adjust separator: move W down, combine with S, move X up to the parent
Deletion Example -- Continued • Delete E (leaf node, need rebalance after deletion) • Left and right sibling has only minimum keys, • Create a new node: combine with left sibling, the separator from the parent, and the deficient node
Deletion Example -- Continued • Continue rebalance • The sibling has only minimum keys • Create a new node: combine the deficient node with the separator from the parent, and the right sibling
2-3 B-Trees or simply referred as 2-3 tree Properties • trinary tree - 3 or fewer children per node • each node is either a 2-node or 3-node (subtree count) • 2-nodes contain 1 value and 3-nodes contain 2 sorted • BST property holds for node content & left, mid, right subtrees • all leaves have same level
B-Tree • A B-tree is kept balanced by requiring that all external nodes are at the same depth. This depth will increase slowly as elements are added to the tree, but an increase in the overall depth is infrequent, and results in all leaf nodes being one more node further away from the root. • B-trees have substantial advantages over alternative implementations when node access times far exceed access times within nodes. This usually occurs when most nodes are in secondary storage such as hard drives. By maximizing the number of child nodes within each internal node, the height of the tree decreases, balancing occurs less often, and efficiency increases. Usually this value is set such that each node takes up a full disk block or an analogous size in secondary storage. • 2-3 B-trees: useful in main memory
public class TwoThreeTree<Content> { private boolean is2node; private Content smallContent; private Content bigContent; private TwoThreeTree<Content> left; private TwoThreeTree<Content> mid; private TwoThreeTree<Content> right; private TwoThreeTree<Content> parent; ... } 2-3 Tree Implementation
Ways to improve a B-tree •keep all values in the leaves •form a linked list of leaf nodes B+-Tree How do these modifications change the performance of ...a search? ...an insertion or removal?
B+ Tree • The B+ tree is a variant of the B-tree, all records are stored at the leaf level of the tree; only keys are stored in interior nodes. • B-tree can store both keys and records in its interior nodes; in this sense, the B+ tree is a specialization of the B-tree.