370 likes | 507 Views
CSCE 310 / 603 Database Systems. Chapter 14: Index Structures. Deletion from B-tree. (a) Simple case - no example (b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf. n=4. (b) Coalesce with sibling Delete 50. 10 40 100. 10 20 30. 40 50.
E N D
CSCE 310 / 603 Database Systems Chapter 14: Index Structures
Deletion from B-tree (a) Simple case - no example (b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf
n=4 (b) Coalesce with sibling • Delete 50 10 40 100 10 20 30 40 50
40 n=4 (b) Coalesce with sibling • Delete 50 10 40 100 10 20 30 40 50
n=4 (c) Redistribute keys • Delete 50 10 40 100 10 20 30 35 40 50
35 35 n=4 (c) Redistribute keys • Delete 50 10 40 100 10 20 30 35 40 50
(d) Non-leaf coalese • Delete 37 n=4 25 10 20 30 40 25 26 30 37 1 3 10 14 20 22 40 45
30 (d) Non-leaf coalese • Delete 37 n=4 25 10 20 30 40 25 26 30 37 1 3 10 14 20 22 40 45
40 30 (d) Non-leaf coalese • Delete 37 n=4 25 10 20 30 40 25 26 30 37 1 3 10 14 20 22 40 45
new root 40 25 30 (d) Non-leaf coalese • Delete 37 n=4 25 10 20 30 40 25 26 30 37 1 3 10 14 20 22 40 45
B-tree deletions in practice • Often, coalescing is not implemented • Too hard and not worth it!
Applications of B-Trees • B-tree is used to implement indexes • The data record pointers in the leaves correspond to the data record pointers in sequential indexes • Some example uses: • B-tree search key is primary key for data file, leaf pointers form a dense index on the file • B-tree search key is primary key for data file, leaf pointers form a sparse index on the file • B-tree search key is not primary key, leaf pointers form a dense index on the file
B-Trees with Duplicate Keys Change definition of B-tree: • If key K appears in an internal node, then K is the smallest "new" key in the subtree S rooted at the pointer that follows K in the node • "New" means K does not appear in the part of the B-tree to the left of S but it does appear in S • Allow null key in certain situations
Example B-Tree with Duplicates 17 -- 37 43 7 2 3 5 23 23 43 47 13 17 23 23 37 41 7 13
Lookup in B-Trees • Assume no duplicate keys. • Assume B-tree is a dense index. • To find the record with key K, search starting at the root and ending at a leaf: • if current node is not a leaf and has keys K1, K2, …, Kn, find the greatest key, Ki, in the sequence that is ≤ K. • follow the (i+1)-st pointer to a node at the next level and repeat • when a leaf node is reached, find the key with value K and follow the associated pointer to the data record
Range Queries with B-Trees • Range query: a query in which a range of values is sought. Examples: • SELECT * FROM R WHERE R.k > 40; • SELECT * FROM R WHERE R.k >= 10 AND R.k <= 25; • To find all keys in the range [a,b]: • Do a lookup on a: leads to leaf where a could be • Search the leaf for all keys ≥ a • If we find a key > b, we are done • Else follow next-leaf pointer and continue searching in the next leaf • Continue until finding a key > b or no more leaves
Efficiency of B-Trees • B-trees allow lookup, insertion and deletion of records with very few disk I/Os • Number of disk I/Os is number of levels in the B-tree plus cost of any reorganization • If n is at least 10, then splitting/merging blocks will be rare and usually limited to the leaves • For typical sizes of keys, pointers, blocks and files, 3 levels suffice (see next slide) • Also can keep root block of B-tree in memory
Size of B-Tree • Assume • 4096 bytes per block • 4 bytes per key (e.g., integer) • 8 bytes per pointer • no header info in the block • Then n = 340 (can keep n keys and n+1 pointers in a block) • Assume on average a block has 255 pointers • Count: • one node at level 1 (the root) • 255 nodes at level 2 • 255*255 = 65,025 nodes at level 3 (leaves) • each leaf has 255 pointers, so total number of records is more than 16 million
Yet More on Indexes Hash Tables
Main Memory Hash Tables • A hash functionh maps search keys to integers in some range 0 to B-1 • B is the number of buckets • There is a B-element array, each entry holds a pointer to a linked list • Record with key k is put in the linked list that starts at entry h(k) of B.
Changes for Secondary Storage • Bucket array contains blocks, not pointers to linked lists • Records that hash to a certain bucket are put in the corresponding block • If a bucket overflows then start a chain of overflow blocks
Insertion into Static Hash Table • To insert a record with key K: • compute h(K) • insert record into one of the blocks in the chain of blocks for bucket number h(K), adding a new block to the chain if necessary
EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 0 1 2 3
d a c b EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 0 1 2 3
d a c b EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 0 1 2 3 h(e) = 1
d a e c b EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 0 1 2 3 h(e) = 1
Deletion from a Static Hash Table • To delete records with key K: • Go to the bucket numbered h(K) • Search for records with key K, deleting any that are found • Possibly condense the chain of overflow blocks for that bucket
EXAMPLE: deletion Delete:ef a 0 1 2 3 b d c e f g
maybe move “g” up EXAMPLE: deletion Delete:ef a 0 1 2 3 b d c e f g
maybe move “g” up EXAMPLE: deletion Delete:ef a 0 1 2 3 b d c c e f g
d maybe move “g” up EXAMPLE: deletion Delete:ef a 0 1 2 3 b d c c e f g
If < 50%, wasting space • If > 80%, overflows significant depends on how good hash function is & on # keys/bucket Rule of thumb: • Try to keep space utilization between 50% and 80% Utilization = # keys used total # keys that fit
Efficiency of Static Hash Tables • If the hash table size is large enough and the distribution of keys by the hash function is sufficiently "even", then most buckets have no overflow blocks • In this case lookup typically takes one disk I/O and insertion/deletion take two • Significantly better than sequential indexes and B-trees • (But: hash tables do not support efficient range queries as B-trees do) • What if there are long overflow blocks?
Comparing Index Approaches • Hashing good for probes given key e.g., SELECT … FROM R WHERE R.A = 5
Indexing vs Hashing • Sequential Indexes and B-trees good for Range Searches: e.g., SELECT FROM R WHERE R.A > 5
Index definition in SQL • Createindex name on rel (attr) • Createuniqueindex name on rel (attr) defines candidate key • Drop INDEX name
CANNOT SPECIFY TYPE OF INDEX (e.g. B-tree, Hashing, …) OR PARAMETERS (e.g. Load Factor, Size of Hash,...) ... at least in SQL... Note