1 / 19

Storage

Storage. CMSC 461 Michael Wilson. Database storage. At some point, database information must be stored in some format It’d be impossible to store hundreds of thousands/millions of rows in memory Numerous ways we could accomplish this We have to take a few things into consideration.

sancha
Download Presentation

Storage

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Storage CMSC 461 Michael Wilson

  2. Database storage • At some point, database information must be stored in some format • It’d be impossible to store hundreds of thousands/millions of rows in memory • Numerous ways we could accomplish this • We have to take a few things into consideration

  3. Storage concerns • Insertion efficiency • When dealing with large amounts of data, it will become more and more of a problem to deal with inserting data depending on how you insert • Retrieval efficiency • Similarly, a larger index of data to search will also result in problems • Space • Make sure our data structure doesn’t take up a large amount of disk space

  4. Storage structures • Arrays? • Hash map?

  5. B-tree • Generalization of a binary search tree (BST) • Can have more than two children • Non-leaf nodes have several keys • Each key defines the bounds of the children of a node • num keys = num children – 1 • Nodes contain keys and are paired with values • All leaves must be at the same depth

  6. B-tree • Number of possible children in the tree is the order of the tree (Knuth’s definition) • Can have a minimum number of keys that must be in a node • Typically choose the maximum number of keys to be twice the minimum number • This helps with balancing • A number of keys less than the minimum is called an underflow

  7. B-tree • Non-leaf node with 3 children • Non-leaf node has keys k1and k2 such that k1 < k2 • All keys less than k1 will be in the child to the left of k1 • All keys in between k1 and k2 are in the child between k1 and k2 • All keys greater than k2 are in the child to the right of k2

  8. B-tree example

  9. Insertion • Insert into the most appropriate leaf • If the node isn’t full, no problem – insert in the proper order (ordered keys) • If the node is full, we need to split

  10. Splitting • A node splits when we try to insert a value into it and it is full • Take the list of numbers from the appropriate node and pick a median from that list • Remove it and store it in a value x • Make two new leaf nodes from the existing list • Left node – all values less than x • Right node – all values greater than x • Insert x into the parent node of the two new nodes and attach them appropriately

  11. Splitting note • When inserting into the parent node, the two new child nodes stay at the same level • A B tree only grows in height from the root

  12. Deletion • Deletion is more complicated • Two cases • Deleting from a leaf node • Deleting values from a leaf • Deleting from an internal node • Deleting a separator value

  13. Deleting from a leaf node • If the value can be deleted and the node will not underflow, then delete it • Otherwise, the node is deficient • We must do work to rebalance the tree

  14. Rotation (stealing from your siblings!) • You may remember this from red black trees • Similar, but not quite the same here • If a deficient node has a right sibling and it has keys to spare, rotate left • If a deficient node has a left sibling and it has keys to spare, rotate right

  15. Rotating left • Rotate left • Copy the separator between the deficient node and it’s right sibling to the end of the deficient node • Replace the separator with the lowest value from the right sibling

  16. Rotating right • Rotate right • Copy the separator between the deficient node and it’s left sibling to the end of the deficient node • Replace the separator with the lowest value from the left sibling

  17. Third case • What if neither sibling has keys to spare? • Third case: • We merge two siblings together • Pick a sibling (any sibling!) • Doesn’t matter which • Refer to them as the left node and right node

  18. Merging siblings (stealing from your parents!) • Copy the separator between the two nodes from the parent to the left node • Move all elements from the right node to the left • Remove the separator from the parent and remove the right node • If the parent was the root and it now has no elements, replace the root with the new node that was just created • If the parent is now underflowing, rebalance using this method

  19. Deleting from an internal node(stealing from children!) • This is pretty simple • The value to be deleted is a separator • Pull the highest value from the left child or the lowest value from the right child and replace the separator, deleting it from the child it was taken from

More Related