CS4432: Database Systems II

CS4432: Database Systems II Basic indexing

Chapter 13 Indexing : helps to retrieve data quicker for certain queries value= 1,000,000 Select * FROM Emp WHERE salary = 1,000,000; record ? value

Topics • Sequential Index Files (chap 13.1) • Secondary Indexes (chap 13.2)

10 30 50 70 90 20 40 60 80 100 Sequential File

10 70 50 30 110 90 80 60 40 20 120 100 90 10 70 50 30 80 60 40 20 100 Sequential File Dense Index Every record is in index.

10 130 90 50 210 170 150 110 70 30 230 190 90 10 70 50 30 80 60 40 20 100 Sequential File Sparse Index Only first record per block in index.

Sequential File Sparse 2nd level 130 330 170 210 10 170 10 50 490 90 70 570 110 30 230 150 250 90 190 410 90 70 50 30 10 40 100 80 60 20

Note : DATA FILE or INDEX are “ordered files”. Question: How would we lay them out on disk ? - contiguous layout on disk ? - block-chained layout on disk ?

10 490 10 210 170 130 90 330 170 50 250 30 70 570 410 150 190 110 230 90 30 50 10 90 70 100 80 60 40 20 Questions: • Do we want to build a dense 2nd-level index for a dense index? • Can we even do this ? 1st level? Sequential File 2nd level?

Notes on pointers: • Block pointer (used in sparse index) can be smaller than record pointer (used in dense index) BP RP

R2 R1 R3 R4 Note : If file is contiguous, then we can omit pointers K1 say: 1024 B per block K2 K3 K4 • if we want K3 block: • get it at offset (3-1)*1024 = 2048 bytes

Sparse vs. Dense Tradeoff • Sparse: Less index space per record can keep more of index in memory (Later: sparse better for insertions) • Dense: Can tell if any record exists without accessing file (Later: dense needed for secondary indexes)

Terms • Index sequential file • Search key ( primary key) • Primary index (on sequencing field) • Secondary index • Dense index (contains all search key values) • Sparse index • Multi-level index

Next: • Duplicate keys • Deletion/Insertion • Secondary indexes

10 10 20 30 40 10 20 30 30 45 Duplicate keys

10 40 10 30 20 30 20 40 10 10 20 30 30 45 20 45 30 30 10 10 10 10 10 10 10 10 20 20 20 20 30 30 30 30 30 30 Duplicate keys Dense index ! Point to each value !

10 10 20 30 40 10 20 30 30 45 10 20 30 40 Duplicate keys Dense index. Point to each distinct value!

10 40 10 30 20 30 30 20 10 45 careful if looking for 20 or 30! Duplicate keys Sparse index: point to start of block ! 10 10 20 30

10 40 10 30 20 30 30 20 10 45 should this be 40? Duplicate keys Sparse index, another way ? • place first new key from block 10 20 30 30

Duplicate values, primary index Summary • Index may point to first instance of each value only File Index a a a . . b

70 10 30 50 20 40 60 80 Deletion from sparse index 10 30 50 70 90 110 130 150

70 10 30 50 20 40 60 80 Deletion from sparse index • delete record 40 10 30 50 70 90 110 130 150

10 30 50 70 20 40 60 80 40 40 Deletion from sparse index • delete record 30 10 30 50 70 90 110 130 150

10 30 50 70 20 40 60 80 50 70 Deletion from sparse index • delete records 30 & 40 10 30 50 70 90 110 130 150

10 30 50 70 80 20 40 60 Deletion from dense index 10 20 30 40 50 60 70 80 lecture #8

10 70 50 20 60 80 40 40 Deletion from dense index • delete record 30 10 20 30 30 40 40 50 60 70 80

10 30 40 60 20 50 Insertion, sparse index case 10 30 40 60

10 30 40 60 20 50 34 • our lucky day! • we have free space • where we need it! Insertion, sparse index case • insert record 34 10 30 40 60

10 30 40 60 20 50 15 20 20 30 Insertion, sparse index case • insert record 15 10 30 40 60 • Immediate reorganization • Other variations?

Just Illustrated: • Immediate reorganization • Now Variation: • insert new block (chained file)

10 30 40 60 25 20 50 Insertion, sparse index case • insert record 25 10 30 overflow blocks (reorganize later...) 40 60

Insertion, dense index case • Similar • Often more expensive . . .

30 20 80 100 90 50 70 40 10 60 Secondary indexes Sequence field Can I make a secondary index sparse ?

100 30 20 80 90 10 40 60 50 70 90 30 ... 20 80 100 Secondary indexes Sequence field • Sparse index does not make sense!

90 30 20 80 100 50 70 40 10 60 50 10 10 60 50 20 30 90 70 40 ... ... sparse high level allowed? Secondary indexes Sequence field • Must be dense index !

Also: Pointers are record pointers (not block pointers; not computed) With secondary indexes: • Lowest level is dense • Other levels are sparse

20 20 10 10 30 10 40 40 40 40 Duplicate values & secondary indexes

30 20 20 10 10 10 40 40 40 40 40 10 20 10 30 40 10 40 ... 20 40 Duplicate values & secondary indexes one option... • Problem: • excess overhead! • disk space • search time

10 20 20 10 30 40 40 40 10 40 20 30 40 Duplicate values & secondary indexes another option... 10 Problem: variable size records in index!

10 20 20 10 30 10 40 40 40 40 50 10 20 60 ... 30 40 Duplicate values & secondary indexes   Another idea :Chain records with same key !   • Problems: • Need to add fields to data records for each index • Need to follow chain to know records

Summary : Conventional Indexes • Basic Ideas: sparse, dense, multi-level… • Duplicate Keys • Deletion/Insertion • Secondary indexes

90 30 20 80 100 50 70 40 10 60 10 50 10 50 20 60 90 70 30 40 ... ... high Level (always sparse) Multi-level Index Structures Sequence field 1 2 3 4 5 first level (dense, if non- sequential)

Sequential indexes : pros/cons ? Advantage: - Simple - Index is sequential file good for scans - Search efficient for static data Disadvantage: - Inserts expensive, and/or - Lose sequentiality & balance - Then search time unpredictable

32 39 38 31 33 34 35 36 overflow area (not sequential) 10 20 30 Example Sequential Index continuous free space 40 50 60 70 80 90

Another type of index • Give up “sequentiality” of index • Predictable performance under updates • Achieve always balance of “tree” • Automate restructuring under updates

B+Tree Example n=3 100 Root 120 150 180 30 3 5 11 120 130 180 200 100 101 110 150 156 179 30 35

Sample non-leaf 57 81 95 to keys to keys to keys to keys < 57 57 k<81 81k<95 95

Sample leaf node: From non-leaf node to next leaf in sequence 57 81 95 To record with key 57 To record with key 81 To record with key 85

CS4432: Database Systems II