1 / 37

CSCE 310 / 603 Database Systems

CSCE 310 / 603 Database Systems. Chapter 14: Index Structures. Deletion from B-tree. (a) Simple case - no example (b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf. n=4. (b) Coalesce with sibling Delete 50. 10 40 100. 10 20 30. 40 50.

argyle
Download Presentation

CSCE 310 / 603 Database Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSCE 310 / 603 Database Systems Chapter 14: Index Structures

  2. Deletion from B-tree (a) Simple case - no example (b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf

  3. n=4 (b) Coalesce with sibling • Delete 50 10 40 100 10 20 30 40 50

  4. 40 n=4 (b) Coalesce with sibling • Delete 50 10 40 100 10 20 30 40 50

  5. n=4 (c) Redistribute keys • Delete 50 10 40 100 10 20 30 35 40 50

  6. 35 35 n=4 (c) Redistribute keys • Delete 50 10 40 100 10 20 30 35 40 50

  7. (d) Non-leaf coalese • Delete 37 n=4 25 10 20 30 40 25 26 30 37 1 3 10 14 20 22 40 45

  8. 30 (d) Non-leaf coalese • Delete 37 n=4 25 10 20 30 40 25 26 30 37 1 3 10 14 20 22 40 45

  9. 40 30 (d) Non-leaf coalese • Delete 37 n=4 25 10 20 30 40 25 26 30 37 1 3 10 14 20 22 40 45

  10. new root 40 25 30 (d) Non-leaf coalese • Delete 37 n=4 25 10 20 30 40 25 26 30 37 1 3 10 14 20 22 40 45

  11. B-tree deletions in practice • Often, coalescing is not implemented • Too hard and not worth it!

  12. Applications of B-Trees • B-tree is used to implement indexes • The data record pointers in the leaves correspond to the data record pointers in sequential indexes • Some example uses: • B-tree search key is primary key for data file, leaf pointers form a dense index on the file • B-tree search key is primary key for data file, leaf pointers form a sparse index on the file • B-tree search key is not primary key, leaf pointers form a dense index on the file

  13. B-Trees with Duplicate Keys Change definition of B-tree: • If key K appears in an internal node, then K is the smallest "new" key in the subtree S rooted at the pointer that follows K in the node • "New" means K does not appear in the part of the B-tree to the left of S but it does appear in S • Allow null key in certain situations

  14. Example B-Tree with Duplicates 17 -- 37 43 7 2 3 5 23 23 43 47 13 17 23 23 37 41 7 13

  15. Lookup in B-Trees • Assume no duplicate keys. • Assume B-tree is a dense index. • To find the record with key K, search starting at the root and ending at a leaf: • if current node is not a leaf and has keys K1, K2, …, Kn, find the greatest key, Ki, in the sequence that is ≤ K. • follow the (i+1)-st pointer to a node at the next level and repeat • when a leaf node is reached, find the key with value K and follow the associated pointer to the data record

  16. Range Queries with B-Trees • Range query: a query in which a range of values is sought. Examples: • SELECT * FROM R WHERE R.k > 40; • SELECT * FROM R WHERE R.k >= 10 AND R.k <= 25; • To find all keys in the range [a,b]: • Do a lookup on a: leads to leaf where a could be • Search the leaf for all keys ≥ a • If we find a key > b, we are done • Else follow next-leaf pointer and continue searching in the next leaf • Continue until finding a key > b or no more leaves

  17. Efficiency of B-Trees • B-trees allow lookup, insertion and deletion of records with very few disk I/Os • Number of disk I/Os is number of levels in the B-tree plus cost of any reorganization • If n is at least 10, then splitting/merging blocks will be rare and usually limited to the leaves • For typical sizes of keys, pointers, blocks and files, 3 levels suffice (see next slide) • Also can keep root block of B-tree in memory

  18. Size of B-Tree • Assume • 4096 bytes per block • 4 bytes per key (e.g., integer) • 8 bytes per pointer • no header info in the block • Then n = 340 (can keep n keys and n+1 pointers in a block) • Assume on average a block has 255 pointers • Count: • one node at level 1 (the root) • 255 nodes at level 2 • 255*255 = 65,025 nodes at level 3 (leaves) • each leaf has 255 pointers, so total number of records is more than 16 million

  19. Yet More on Indexes Hash Tables

  20. Main Memory Hash Tables • A hash functionh maps search keys to integers in some range 0 to B-1 • B is the number of buckets • There is a B-element array, each entry holds a pointer to a linked list • Record with key k is put in the linked list that starts at entry h(k) of B.

  21. Changes for Secondary Storage • Bucket array contains blocks, not pointers to linked lists • Records that hash to a certain bucket are put in the corresponding block • If a bucket overflows then start a chain of overflow blocks

  22. Insertion into Static Hash Table • To insert a record with key K: • compute h(K) • insert record into one of the blocks in the chain of blocks for bucket number h(K), adding a new block to the chain if necessary

  23. EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 0 1 2 3

  24. d a c b EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 0 1 2 3

  25. d a c b EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 0 1 2 3 h(e) = 1

  26. d a e c b EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 0 1 2 3 h(e) = 1

  27. Deletion from a Static Hash Table • To delete records with key K: • Go to the bucket numbered h(K) • Search for records with key K, deleting any that are found • Possibly condense the chain of overflow blocks for that bucket

  28. EXAMPLE: deletion Delete:ef a 0 1 2 3 b d c e f g

  29. maybe move “g” up EXAMPLE: deletion Delete:ef a 0 1 2 3 b d c e f g

  30. maybe move “g” up EXAMPLE: deletion Delete:ef a 0 1 2 3 b d c c e f g

  31. d maybe move “g” up EXAMPLE: deletion Delete:ef a 0 1 2 3 b d c c e f g

  32. If < 50%, wasting space • If > 80%, overflows significant depends on how good hash function is & on # keys/bucket Rule of thumb: • Try to keep space utilization between 50% and 80% Utilization = # keys used total # keys that fit

  33. Efficiency of Static Hash Tables • If the hash table size is large enough and the distribution of keys by the hash function is sufficiently "even", then most buckets have no overflow blocks • In this case lookup typically takes one disk I/O and insertion/deletion take two • Significantly better than sequential indexes and B-trees • (But: hash tables do not support efficient range queries as B-trees do) • What if there are long overflow blocks?

  34. Comparing Index Approaches • Hashing good for probes given key e.g., SELECT … FROM R WHERE R.A = 5

  35. Indexing vs Hashing • Sequential Indexes and B-trees good for Range Searches: e.g., SELECT FROM R WHERE R.A > 5

  36. Index definition in SQL • Createindex name on rel (attr) • Createuniqueindex name on rel (attr) defines candidate key • Drop INDEX name

  37. CANNOT SPECIFY TYPE OF INDEX (e.g. B-tree, Hashing, …) OR PARAMETERS (e.g. Load Factor, Size of Hash,...) ... at least in SQL... Note

More Related