CSCE 310 / 603 Database Systems

CSCE 310 / 603 Database Systems Chapter 14: Index Structures

Indexes on Sequential Files

Effects of Actions on Index

Explanations for Actions • create/destroy empty overflow block has no effect on • dense index since it refers to records • sparse index since it refers to main records • create/destroy empty main block: • no effect on dense index as above • insert/delete entry in sparse index • insert/delete/slide record: • insert/delete/update entry in dense index • only change sparse index if affects first record in block

70 10 30 50 20 40 60 80 Deletion from sparse index 10 30 50 70 90 110 130 150

70 10 30 50 20 40 60 80 Deletion from sparse index • delete record 40 10 30 50 70 90 110 130 150

70 10 30 50 20 40 60 80 Deletion from sparse index • delete record 30 10 30 50 70 90 110 130 150

10 30 50 70 20 40 60 80 40 40 Deletion from sparse index • delete record 30 10 30 50 70 90 110 130 150

70 10 30 50 20 40 60 80 Deletion from sparse index • delete records 30 & 40 10 30 50 70 90 110 130 150

10 30 50 70 20 40 60 80 50 70 Deletion from sparse index • delete records 30 & 40 10 30 50 70 90 110 130 150

70 10 30 50 20 40 60 80 Deletion from dense index 10 20 30 40 50 60 70 80

70 10 50 20 80 60 Deletion from dense index • delete record 30 10 20 30 30 40 40 50 60 70 80

10 70 50 20 80 60 40 Deletion from dense index • delete record 30 10 20 30 30 40 40 50 60 70 80

10 70 50 20 60 80 40 40 Deletion from dense index • delete record 30 10 20 30 30 40 40 50 60 70 80

10 30 40 60 20 50 Insertion, sparse index case 10 30 40 60

10 30 40 60 20 50 Insertion, sparse index case • insert record 34 10 30 40 60

10 30 40 60 20 50 34 • our lucky day! • we have free space • where we need it! Insertion, sparse index case • insert record 34 10 30 40 60

10 30 40 60 20 50 15 20 20 30 Insertion, sparse index case • insert record 15 10 30 40 60

10 30 40 60 20 50 15 20 20 30 Insertion, sparse index case • insert record 15 10 30 40 60 • Illustrated: Immediate reorganization • Variation: • insert new block (chained file) • update index

10 30 40 60 25 20 50 overflow blocks (reorganize later...) Insertion, sparse index case • insert record 25 10 30 40 60

Insertion, dense index case • Similar • Often more expensive . . .

More on Indexes Secondary Indexes B-Trees

Secondary Indexes • Sometimes we want multiple indexes on a relation. • Ex: search Candies(name,manf) both by name and by manufacturer • Typically the file would be sorted using the key (ex: name) and the primary index would be on that field. • The secondary index is on any other attribute (ex: manf). • Secondary index also facilitates finding records, but cannot rely on them being sorted

Sparse Secondary Index? • No! • Since records are not sorted on that key, cannot predict the location of a record from the location of any other record. • Thus secondary indexes are always dense.

100 30 20 80 90 60 40 10 50 70 90 30 20 ... 80 100 Sequence field • Sparse index

30 20 80 100 90 40 70 10 60 50 90 30 ... 20 80 100 does not make sense! Sequence field • Sparse index

Design of Secondary Indexes • Always dense, usually with duplicates • Consists of key-pointer pairs ("key" means search key, not relation key) • Entries in index file are sorted by key • Therefore second-level index is sparse

90 30 20 80 100 50 70 40 10 60 50 10 10 20 50 60 30 70 90 40 ... ... sparse second- level Secondary indexes Sequence field dense first- level

Secondary Index and Duplicate Keys • Scheme in previous diagram wastes space in the present of duplicate keys • If a search key value appears n times in the data file, then there are n entries for it in the index.

20 20 10 10 30 10 40 40 40 40 Duplicate values & secondary indexes

30 20 20 10 10 10 40 40 40 40 40 10 20 10 30 40 10 40 ... 20 40 Duplicate values & secondary indexes one option...

30 20 20 10 10 10 40 40 40 40 40 10 20 10 30 40 10 40 ... 20 40 Duplicate values & secondary indexes one option... • Problem: • excess overhead! • disk space • search time

Buckets • To avoid repeating values, use a level of indirection • Put buckets between the secondary index file and the data file • One entry in index for each search key K; its pointer goes to a location in a "bucket file", called the bucket for K • Bucket holds pointers to all records with search key K

10 30 20 20 10 10 40 40 40 40 50 10 20 60 ... 30 40 Duplicate values & secondary indexes buckets

Why “bucket” idea is useful Indexes Records name: primary Emp (name,dept,floor,...) dept: secondary floor: secondary

Query: SELECT name FROM Emp WHERE dept = 'Toy' AND floor = 2

dept index Emp floor index Toy 2 Query: SELECT name FROM Emp WHERE dept = 'Toy' AND floor = 2

dept index Emp floor index Toy 2 Query: SELECT name FROM Emp WHERE dept = 'Toy' AND floor = 2 • Intersect Toy dept bucket and floor 2 bucket to get set of matching Emp’sSaves disk I/O's

Summary of Indexes So Far • Advantages: • simple • index is sequential file, good for scans • Disadvantages • either inserts are expensive • or lose sequentiality (cf. next slide) • Instead use B-tree data structure to implement index

10 Example Index (sequential) continuous free space 20 30 40 50 60 70 80 90

32 39 38 31 33 34 35 36 overflow area (not sequential) 10 Example Index (sequential) continuous free space 20 30 40 50 60 70 80 90

B-Trees • Several related data structures • Key features are: • automatically adjust number of levels of indexes as size of data file changes • storage on blocks is managed to keep every block between half full and full => no overflow blocks needed • We'll actually study B+ trees

B-Tree Structure • an example of a balanced search tree: every root-to-leaf path has same length • each node (vertex) in the tree is a block, which contains search keys and pointers • parameter n, which is largest value so that n+1 pointers and n keys fit in one block • Ex: If block size is 4096 bytes, keys are 4 bytes, and pointers are 8 bytes, then n = 340.

Constraints on B-Tree Nodes • Keys in leaf nodes are copies of keys from data file, in sorted order • Root contains between 2 and n+1 index node pointers • Each internal node contains between (n+1)/2 and n+1 index node pointers • Each non-leaf node consists of ptr1,key1,ptr2,key2,…,keym-1,ptrm where ptri points to index node with keys between keyi-1 and keyi

Constraints (cont'd) • Each leaf contains between (n+1)/2 and ndata record pointers, plus a "next leaf" pointer • Associated with each data record pointer is a key, and the pointer points to the data record with that key

CSCE 310 / 603 Database Systems

CSCE 310 / 603 Database Systems

Presentation Transcript

Chapter 8 Physical Database Design

Principles of Database Management Systems 4.2: Hashing Techniques

Mobile Database Systems

CS216 Advanced Database Systems

Database Systems: Design, Implementation, and Management Tenth Edition

Advanced Database Systems

COP5725 Advanced Database Systems

CSC 545: Database Systems

Introduction to Database Management Systems (DBMS)

Database II

Database Systems

Database Management Systems (DBMS)

CS216 Advanced Database Systems

CSCE 590E Spring 2007

CS 253: Topics in Database Systems: C2

CS216 Advanced Database Systems

Mobile Database Systems

Database Management Systems

Database Management Systems Session 9