200 likes | 371 Views
Storage. CMSC 461 Michael Wilson. Database storage. At some point, database information must be stored in some format It’d be impossible to store hundreds of thousands/millions of rows in memory Numerous ways we could accomplish this We have to take a few things into consideration.
E N D
Storage CMSC 461 Michael Wilson
Database storage • At some point, database information must be stored in some format • It’d be impossible to store hundreds of thousands/millions of rows in memory • Numerous ways we could accomplish this • We have to take a few things into consideration
Storage concerns • Insertion efficiency • When dealing with large amounts of data, it will become more and more of a problem to deal with inserting data depending on how you insert • Retrieval efficiency • Similarly, a larger index of data to search will also result in problems • Space • Make sure our data structure doesn’t take up a large amount of disk space
Storage structures • Arrays? • Hash map?
B-tree • Generalization of a binary search tree (BST) • Can have more than two children • Non-leaf nodes have several keys • Each key defines the bounds of the children of a node • num keys = num children – 1 • Nodes contain keys and are paired with values • All leaves must be at the same depth
B-tree • Number of possible children in the tree is the order of the tree (Knuth’s definition) • Can have a minimum number of keys that must be in a node • Typically choose the maximum number of keys to be twice the minimum number • This helps with balancing • A number of keys less than the minimum is called an underflow
B-tree • Non-leaf node with 3 children • Non-leaf node has keys k1and k2 such that k1 < k2 • All keys less than k1 will be in the child to the left of k1 • All keys in between k1 and k2 are in the child between k1 and k2 • All keys greater than k2 are in the child to the right of k2
Insertion • Insert into the most appropriate leaf • If the node isn’t full, no problem – insert in the proper order (ordered keys) • If the node is full, we need to split
Splitting • A node splits when we try to insert a value into it and it is full • Take the list of numbers from the appropriate node and pick a median from that list • Remove it and store it in a value x • Make two new leaf nodes from the existing list • Left node – all values less than x • Right node – all values greater than x • Insert x into the parent node of the two new nodes and attach them appropriately
Splitting note • When inserting into the parent node, the two new child nodes stay at the same level • A B tree only grows in height from the root
Deletion • Deletion is more complicated • Two cases • Deleting from a leaf node • Deleting values from a leaf • Deleting from an internal node • Deleting a separator value
Deleting from a leaf node • If the value can be deleted and the node will not underflow, then delete it • Otherwise, the node is deficient • We must do work to rebalance the tree
Rotation (stealing from your siblings!) • You may remember this from red black trees • Similar, but not quite the same here • If a deficient node has a right sibling and it has keys to spare, rotate left • If a deficient node has a left sibling and it has keys to spare, rotate right
Rotating left • Rotate left • Copy the separator between the deficient node and it’s right sibling to the end of the deficient node • Replace the separator with the lowest value from the right sibling
Rotating right • Rotate right • Copy the separator between the deficient node and it’s left sibling to the end of the deficient node • Replace the separator with the lowest value from the left sibling
Third case • What if neither sibling has keys to spare? • Third case: • We merge two siblings together • Pick a sibling (any sibling!) • Doesn’t matter which • Refer to them as the left node and right node
Merging siblings (stealing from your parents!) • Copy the separator between the two nodes from the parent to the left node • Move all elements from the right node to the left • Remove the separator from the parent and remove the right node • If the parent was the root and it now has no elements, replace the root with the new node that was just created • If the parent is now underflowing, rebalance using this method
Deleting from an internal node(stealing from children!) • This is pretty simple • The value to be deleted is a separator • Pull the highest value from the left child or the lowest value from the right child and replace the separator, deleting it from the child it was taken from