CSCE 310 / 603 Database Systems

CSCE 310 / 603 Database Systems Chapter 14: Index Structures

Deletion from B-tree (a) Simple case - no example (b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf

n=4 (b) Coalesce with sibling • Delete 50 10 40 100 10 20 30 40 50

40 n=4 (b) Coalesce with sibling • Delete 50 10 40 100 10 20 30 40 50

n=4 (c) Redistribute keys • Delete 50 10 40 100 10 20 30 35 40 50

35 35 n=4 (c) Redistribute keys • Delete 50 10 40 100 10 20 30 35 40 50

(d) Non-leaf coalese • Delete 37 n=4 25 10 20 30 40 25 26 30 37 1 3 10 14 20 22 40 45

30 (d) Non-leaf coalese • Delete 37 n=4 25 10 20 30 40 25 26 30 37 1 3 10 14 20 22 40 45

40 30 (d) Non-leaf coalese • Delete 37 n=4 25 10 20 30 40 25 26 30 37 1 3 10 14 20 22 40 45

new root 40 25 30 (d) Non-leaf coalese • Delete 37 n=4 25 10 20 30 40 25 26 30 37 1 3 10 14 20 22 40 45

B-tree deletions in practice • Often, coalescing is not implemented • Too hard and not worth it!

Applications of B-Trees • B-tree is used to implement indexes • The data record pointers in the leaves correspond to the data record pointers in sequential indexes • Some example uses: • B-tree search key is primary key for data file, leaf pointers form a dense index on the file • B-tree search key is primary key for data file, leaf pointers form a sparse index on the file • B-tree search key is not primary key, leaf pointers form a dense index on the file

B-Trees with Duplicate Keys Change definition of B-tree: • If key K appears in an internal node, then K is the smallest "new" key in the subtree S rooted at the pointer that follows K in the node • "New" means K does not appear in the part of the B-tree to the left of S but it does appear in S • Allow null key in certain situations

Example B-Tree with Duplicates 17 -- 37 43 7 2 3 5 23 23 43 47 13 17 23 23 37 41 7 13

Lookup in B-Trees • Assume no duplicate keys. • Assume B-tree is a dense index. • To find the record with key K, search starting at the root and ending at a leaf: • if current node is not a leaf and has keys K1, K2, …, Kn, find the greatest key, Ki, in the sequence that is ≤ K. • follow the (i+1)-st pointer to a node at the next level and repeat • when a leaf node is reached, find the key with value K and follow the associated pointer to the data record

Range Queries with B-Trees • Range query: a query in which a range of values is sought. Examples: • SELECT * FROM R WHERE R.k > 40; • SELECT * FROM R WHERE R.k >= 10 AND R.k <= 25; • To find all keys in the range [a,b]: • Do a lookup on a: leads to leaf where a could be • Search the leaf for all keys ≥ a • If we find a key > b, we are done • Else follow next-leaf pointer and continue searching in the next leaf • Continue until finding a key > b or no more leaves

Efficiency of B-Trees • B-trees allow lookup, insertion and deletion of records with very few disk I/Os • Number of disk I/Os is number of levels in the B-tree plus cost of any reorganization • If n is at least 10, then splitting/merging blocks will be rare and usually limited to the leaves • For typical sizes of keys, pointers, blocks and files, 3 levels suffice (see next slide) • Also can keep root block of B-tree in memory

Size of B-Tree • Assume • 4096 bytes per block • 4 bytes per key (e.g., integer) • 8 bytes per pointer • no header info in the block • Then n = 340 (can keep n keys and n+1 pointers in a block) • Assume on average a block has 255 pointers • Count: • one node at level 1 (the root) • 255 nodes at level 2 • 255*255 = 65,025 nodes at level 3 (leaves) • each leaf has 255 pointers, so total number of records is more than 16 million

Yet More on Indexes Hash Tables

Main Memory Hash Tables • A hash functionh maps search keys to integers in some range 0 to B-1 • B is the number of buckets • There is a B-element array, each entry holds a pointer to a linked list • Record with key k is put in the linked list that starts at entry h(k) of B.

Changes for Secondary Storage • Bucket array contains blocks, not pointers to linked lists • Records that hash to a certain bucket are put in the corresponding block • If a bucket overflows then start a chain of overflow blocks

Insertion into Static Hash Table • To insert a record with key K: • compute h(K) • insert record into one of the blocks in the chain of blocks for bucket number h(K), adding a new block to the chain if necessary

EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 0 1 2 3

d a c b EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 0 1 2 3

d a c b EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 0 1 2 3 h(e) = 1

d a e c b EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 0 1 2 3 h(e) = 1

Deletion from a Static Hash Table • To delete records with key K: • Go to the bucket numbered h(K) • Search for records with key K, deleting any that are found • Possibly condense the chain of overflow blocks for that bucket

EXAMPLE: deletion Delete:ef a 0 1 2 3 b d c e f g

maybe move “g” up EXAMPLE: deletion Delete:ef a 0 1 2 3 b d c e f g

maybe move “g” up EXAMPLE: deletion Delete:ef a 0 1 2 3 b d c c e f g

d maybe move “g” up EXAMPLE: deletion Delete:ef a 0 1 2 3 b d c c e f g

If < 50%, wasting space • If > 80%, overflows significant depends on how good hash function is & on # keys/bucket Rule of thumb: • Try to keep space utilization between 50% and 80% Utilization = # keys used total # keys that fit

Efficiency of Static Hash Tables • If the hash table size is large enough and the distribution of keys by the hash function is sufficiently "even", then most buckets have no overflow blocks • In this case lookup typically takes one disk I/O and insertion/deletion take two • Significantly better than sequential indexes and B-trees • (But: hash tables do not support efficient range queries as B-trees do) • What if there are long overflow blocks?

Comparing Index Approaches • Hashing good for probes given key e.g., SELECT … FROM R WHERE R.A = 5

Indexing vs Hashing • Sequential Indexes and B-trees good for Range Searches: e.g., SELECT FROM R WHERE R.A > 5

Index definition in SQL • Createindex name on rel (attr) • Createuniqueindex name on rel (attr) defines candidate key • Drop INDEX name

CANNOT SPECIFY TYPE OF INDEX (e.g. B-tree, Hashing, …) OR PARAMETERS (e.g. Load Factor, Size of Hash,...) ... at least in SQL... Note

CSCE 310 / 603 Database Systems

CSCE 310 / 603 Database Systems

Presentation Transcript

Mobile Database Systems

Chapter 4 The Database Management System Concept

Real-Time Database Systems and Data Services: Issues and Challenges

70420403- Database Systems 数据库管理系统

CSCE 580 Artificial Intelligence Ch.4: Features and Constraints

Principles of Database Management Systems 8: Concurrency Control

Using Management Information Systems

CS 328 Database Systems Chapter 8: SQL

CS411 Database Systems

CSC321: Database Management Systems

CS411 Database Systems

ICOM 5016 – Introduction to Database Systems

COP 4710: Database Systems Fall 2013 Chapter 19 – Normalization

Database Management Systems

Chapter 5 Relational Database Management Systems and SQL

Storage Systems

CS 253: Topics in Database Systems: C2

Database Systems The Relational Data Model

Temple University – CIS Dept. CIS616– Principles of Database Systems

Introduction to Database Systems Queries in SQL