690 likes | 854 Views
CSCE 310 / 603 Database Systems. Chapter 14: Index Structures. Indexes on Sequential Files. Effects of Actions on Index. Explanations for Actions. create/destroy empty overflow block has no effect on dense index since it refers to records sparse index since it refers to main records
E N D
CSCE 310 / 603 Database Systems Chapter 14: Index Structures
Explanations for Actions • create/destroy empty overflow block has no effect on • dense index since it refers to records • sparse index since it refers to main records • create/destroy empty main block: • no effect on dense index as above • insert/delete entry in sparse index • insert/delete/slide record: • insert/delete/update entry in dense index • only change sparse index if affects first record in block
70 10 30 50 20 40 60 80 Deletion from sparse index 10 30 50 70 90 110 130 150
70 10 30 50 20 40 60 80 Deletion from sparse index • delete record 40 10 30 50 70 90 110 130 150
70 10 30 50 20 40 60 80 Deletion from sparse index • delete record 40 10 30 50 70 90 110 130 150
70 10 30 50 20 40 60 80 Deletion from sparse index • delete record 30 10 30 50 70 90 110 130 150
10 30 50 70 20 40 60 80 40 40 Deletion from sparse index • delete record 30 10 30 50 70 90 110 130 150
70 10 30 50 20 40 60 80 Deletion from sparse index • delete records 30 & 40 10 30 50 70 90 110 130 150
70 10 30 50 20 40 60 80 Deletion from sparse index • delete records 30 & 40 10 30 50 70 90 110 130 150
10 30 50 70 20 40 60 80 50 70 Deletion from sparse index • delete records 30 & 40 10 30 50 70 90 110 130 150
70 10 30 50 20 40 60 80 Deletion from dense index 10 20 30 40 50 60 70 80
70 10 50 20 80 60 Deletion from dense index • delete record 30 10 20 30 30 40 40 50 60 70 80
10 70 50 20 80 60 40 Deletion from dense index • delete record 30 10 20 30 30 40 40 50 60 70 80
10 70 50 20 60 80 40 40 Deletion from dense index • delete record 30 10 20 30 30 40 40 50 60 70 80
10 30 40 60 20 50 Insertion, sparse index case 10 30 40 60
10 30 40 60 20 50 Insertion, sparse index case • insert record 34 10 30 40 60
10 30 40 60 20 50 34 • our lucky day! • we have free space • where we need it! Insertion, sparse index case • insert record 34 10 30 40 60
10 30 40 60 20 50 Insertion, sparse index case • insert record 15 10 30 40 60
10 30 40 60 20 50 15 20 20 30 Insertion, sparse index case • insert record 15 10 30 40 60
10 30 40 60 20 50 15 20 20 30 Insertion, sparse index case • insert record 15 10 30 40 60 • Illustrated: Immediate reorganization • Variation: • insert new block (chained file) • update index
10 30 40 60 20 50 Insertion, sparse index case • insert record 25 10 30 40 60
10 30 40 60 25 20 50 overflow blocks (reorganize later...) Insertion, sparse index case • insert record 25 10 30 40 60
10 30 40 60 25 20 50 overflow blocks (reorganize later...) Insertion, sparse index case • insert record 25 10 30 40 60
Insertion, dense index case • Similar • Often more expensive . . .
More on Indexes Secondary Indexes B-Trees
Secondary Indexes • Sometimes we want multiple indexes on a relation. • Ex: search Candies(name,manf) both by name and by manufacturer • Typically the file would be sorted using the key (ex: name) and the primary index would be on that field. • The secondary index is on any other attribute (ex: manf). • Secondary index also facilitates finding records, but cannot rely on them being sorted
Sparse Secondary Index? • No! • Since records are not sorted on that key, cannot predict the location of a record from the location of any other record. • Thus secondary indexes are always dense.
100 30 20 80 90 60 40 10 50 70 90 30 20 ... 80 100 Sequence field • Sparse index
30 20 80 100 90 40 70 10 60 50 90 30 ... 20 80 100 does not make sense! Sequence field • Sparse index
Design of Secondary Indexes • Always dense, usually with duplicates • Consists of key-pointer pairs ("key" means search key, not relation key) • Entries in index file are sorted by key • Therefore second-level index is sparse
90 30 20 80 100 50 70 40 10 60 50 10 10 20 50 60 30 70 90 40 ... ... sparse second- level Secondary indexes Sequence field dense first- level
Secondary Index and Duplicate Keys • Scheme in previous diagram wastes space in the present of duplicate keys • If a search key value appears n times in the data file, then there are n entries for it in the index.
20 20 10 10 30 10 40 40 40 40 Duplicate values & secondary indexes
30 20 20 10 10 10 40 40 40 40 40 10 20 10 30 40 10 40 ... 20 40 Duplicate values & secondary indexes one option...
30 20 20 10 10 10 40 40 40 40 40 10 20 10 30 40 10 40 ... 20 40 Duplicate values & secondary indexes one option... • Problem: • excess overhead! • disk space • search time
Buckets • To avoid repeating values, use a level of indirection • Put buckets between the secondary index file and the data file • One entry in index for each search key K; its pointer goes to a location in a "bucket file", called the bucket for K • Bucket holds pointers to all records with search key K
10 30 20 20 10 10 40 40 40 40 50 10 20 60 ... 30 40 Duplicate values & secondary indexes buckets
Why “bucket” idea is useful Indexes Records name: primary Emp (name,dept,floor,...) dept: secondary floor: secondary
Query: SELECT name FROM Emp WHERE dept = 'Toy' AND floor = 2
dept index Emp floor index Toy 2 Query: SELECT name FROM Emp WHERE dept = 'Toy' AND floor = 2
dept index Emp floor index Toy 2 Query: SELECT name FROM Emp WHERE dept = 'Toy' AND floor = 2 • Intersect Toy dept bucket and floor 2 bucket to get set of matching Emp’sSaves disk I/O's
Summary of Indexes So Far • Advantages: • simple • index is sequential file, good for scans • Disadvantages • either inserts are expensive • or lose sequentiality (cf. next slide) • Instead use B-tree data structure to implement index
10 Example Index (sequential) continuous free space 20 30 40 50 60 70 80 90
32 39 38 31 33 34 35 36 overflow area (not sequential) 10 Example Index (sequential) continuous free space 20 30 40 50 60 70 80 90
B-Trees • Several related data structures • Key features are: • automatically adjust number of levels of indexes as size of data file changes • storage on blocks is managed to keep every block between half full and full => no overflow blocks needed • We'll actually study B+ trees
B-Tree Structure • an example of a balanced search tree: every root-to-leaf path has same length • each node (vertex) in the tree is a block, which contains search keys and pointers • parameter n, which is largest value so that n+1 pointers and n keys fit in one block • Ex: If block size is 4096 bytes, keys are 4 bytes, and pointers are 8 bytes, then n = 340.
Constraints on B-Tree Nodes • Keys in leaf nodes are copies of keys from data file, in sorted order • Root contains between 2 and n+1 index node pointers • Each internal node contains between (n+1)/2 and n+1 index node pointers • Each non-leaf node consists of ptr1,key1,ptr2,key2,…,keym-1,ptrm where ptri points to index node with keys between keyi-1 and keyi
Constraints (cont'd) • Each leaf contains between (n+1)/2 and ndata record pointers, plus a "next leaf" pointer • Associated with each data record pointer is a key, and the pointer points to the data record with that key