120 likes | 325 Views
Database indices. Database Systems manage very large amounts of data. Examples: student database for NWU Social Security database To facilitate queries, we create indices.
E N D
Database indices • Database Systems manage very large amounts of data. • Examples: • student database for NWU • Social Security database • To facilitate queries, we create indices. • An index is any data structure that takes as input a property (e.g. a value for a specific field), called the search key, and quickly finds all records with that property. • A database may have several indices (based on different keys) • Examples: • An index to search for students by id and another index to search by name.
Database indices • The actual records (and the index) typically do not fit in memory. • Secondary (or tertiary) storage must be used. • Disk operations are very time consuming, so we would like to limit them. • Since CPU operations are much faster (one disk access equals several million CPU instructions), we would be willing to do any sort of preprocessing that may reduce disk I/O.
Database indices • Idea: • Expand the BST idea to create a multi-way search tree: • Instead of a long, thin tree with at most 2 children per node, create a short, wide tree with many children per node. • Each node will then need to have children - 1keys. • Try to maintain the tree balanced and as full as possible. • Finding the correct branch to follow requires several comparisons (CPU operations) and leads to few disk accesses.
B-trees • The most common data structure used for database indices is the B-tree. • A B-tree of order m is an m-way tree where • All leaves are on the same level • All internal nodes except the root have k-1 keys and k children wherem/2 k m • The root is either a leaf or has between 2 and m children.
B-trees B-tree of order 3 keys greater than 19 and less than 51 keys smaller than 19 keys greater than 51 19 51 9 32 46 60 2 6 11 14 21 30 33 35 48 57 80
B-trees: Insert Generalization of the search method for a BST Find the appropriate leaf If has room insert key else // overflow! split See next slide for details on the split operation.
B-trees: Insert How to handle an overflow at a leaf. Pick the middle key of the leaf. Split the leaf in two, each part containing half of the elements. If the leaf has a parent p insert to the parentmake p the parent of the two pieces check whether the parent overflows if yes, repeat the splitting process for the parent. else // the leaf was the root create a new root and insert to itmake it the parent of the two pieces
B-trees: Delete Find the element to be deleted. If it is not a leaf, replace it with its immediate successor and delete the successor instead. else delete the element. Check for underflow (too few children) guaranteed to be in a leaf See next slide for more details.
B-trees: Delete How to handle an underflow at a node n. Check whether any of the siblings can afford to lose children. If yes,transfer a child move a child from the sibling to n move a key from n's parent to n move a key from the sibling to the parent. else merge n with a sibling move a key from the parent to the merged node check the parent for underflow. The sibling loses a child so it must lose a key and n gains a child so it must gain a key. The parent loses a child so it must lose a key.
B-trees • B-trees grow when a new root is created as a result of an insert operation. • B-trees shrink when the root has only two children and they merge as a result of a delete operation. This will cause the root's only key to move down and the root to become empty.
B*-trees • Variant of the B-tree. • Each node must be at least 2/3 full • Overflow is handled mainly by redistributing keys among siblings. If all siblings are full, then 2 nodes are split into 3.