500 likes | 633 Views
CSCE 520 Test 2 Info Indexing. Modified from slides of Hector Garcia-Molina and Jeff Ullman. Physical Storage Media. Speed of data access Cost per unit of data Reliability Data loss (power failure or system crash) Physical failure (storage device) Storage types Volatile storage
E N D
CSCE 520 Test 2 InfoIndexing Modified from slides of Hector Garcia-Molina and Jeff Ullman
Physical Storage Media • Speed of data access • Cost per unit of data • Reliability • Data loss (power failure or system crash) • Physical failure (storage device) • Storage types • Volatile storage • Non-volatile storage
Memory Hierarchy Programs, Main Memory DBMS Tertiary Storage DBMS Disk Virtual Memory File System Main Memory Cache
Disk Access Characteristics • Move data to main memory: • Position head on cylinder • Find and access sector • Steps of reading a block: • Processor and disk controller processes the request • Seek time: position the head • Rotation latency: rotate the sector under the head • Transfer time: sector/block read by the head
Disk Access Characteristics • Steps of writing a block: • Read the block into the main memory • Change main memory copy of block • Write new content back on disk • Verify correctness of write
How to find records efficiently? • Primary key – sequential organization • Search key? • High I/O cost INDEXING
Cost of Indexing • Where the time spent on answering a query • Fast: processing in memory • Slow: fetching from secondary storage • Cost of indexing: • Index on several attributes: fast retrieval but slow writes (maintain index structure)
Topics • Conventional indexes • B-trees • Hashing schemes (read only)
10 30 50 70 90 20 40 60 80 100 Sequential File
70 50 30 10 110 90 20 40 100 80 120 60 70 50 30 10 90 80 60 40 20 100 Sequential File Dense Index
130 90 50 10 210 170 30 70 190 150 230 110 70 50 30 10 90 80 60 40 20 100 Sequential File Sparse Index
210 50 130 10 170 490 90 170 330 10 110 30 570 70 410 150 190 90 230 250 30 90 70 50 10 100 80 60 40 20 Sequential File Sparse 2nd level
Sparse vs. Dense Tradeoff • Sparse: Less index space per record can keep more of index in memory • Dense: Can tell if any record exists without accessing file
Terms • Index sequential file • Search key ( primary key) • Primary index (on Sequencing field) • Secondary index • Dense index (all Search Key values in) • Sparse index • Multi-level index
Next: • Duplicate keys • Deletion/Insertion • Secondary indexes
10 10 20 30 40 10 20 30 30 45 Duplicate keys
10 40 10 20 30 30 40 20 10 10 45 20 30 10 30 30 30 45 10 20 Duplicate keys Dense index, one way to implement? 10 10 10 10 10 10 20 20 20 20 30 30 30 30 30 30
10 10 20 30 40 10 20 30 30 45 Duplicate keys Dense index, better way? 10 20 30 40
10 40 10 30 20 30 30 20 10 45 careful if looking for 20 or 30! Duplicate keys Sparse index, one way? 10 10 20 30
10 40 10 30 20 30 30 20 10 45 should this be 40? Duplicate keys Sparse index, another way? • place first new key from block 10 20 30 30
Duplicate values, primary index Summary • Index may point to first instance of each value only File Index a a a . . b
70 10 30 50 20 40 60 80 Deletion from sparse index 10 30 50 70 90 110 130 150
70 10 30 50 20 40 60 80 Deletion from sparse index • delete record 40 10 30 50 70 90 110 130 150
10 30 50 70 20 40 60 80 40 40 Deletion from sparse index • delete record 30 10 30 50 70 90 110 130 150
10 30 50 70 20 40 60 80 50 70 Deletion from sparse index • delete records 30 & 40 10 30 50 70 90 110 130 150
70 10 30 50 20 40 60 80 Deletion from dense index 10 20 30 40 50 60 70 80
10 70 50 20 60 80 40 40 Deletion from dense index • delete record 30 10 20 30 30 40 40 50 60 70 80
10 30 40 60 20 50 Insertion, sparse index case 10 30 40 60
10 30 40 60 20 50 34 • our lucky day! • we have free space • where we need it! Insertion, sparse index case • insert record 34 10 30 40 60
10 30 40 60 20 50 15 20 20 30 Insertion, sparse index case • insert record 15 10 30 40 60 • Illustrated: Immediate reorganization • Variation: • insert new block (chained file) • update index
10 30 40 60 25 20 50 overflow blocks (reorganize later...) Insertion, sparse index case • insert record 25 10 30 40 60
Insertion, dense index case • Similar • Often more expensive . . .
Summary so far • Conventional index • Basic Ideas: sparse, dense, multi-level… • Duplicate Keys • Deletion/Insertion • Secondary indexes
Conventional indexes Advantage: - Simple - Index is sequential file good for scans Disadvantage: - Inserts expensive, and/or - Lose sequentiality & balance
NEXT: Another type of index • Give up on sequentiality of index • Try to get “balance”
B+Tree Example n=3 100 Root 120 150 180 30 3 5 11 120 130 180 200 100 101 110 150 156 179 30 35
Sample non-leaf 57 81 95 to keys to keys to keys to keys < 57 57 k<81 81k<95 95
Sample leaf node: From non-leaf node to next leaf in sequence 57 81 95 To record with key 57 To record with key 81 To record with key 85
Size of nodes: n+1 pointers n keys (fixed)
Don’t want nodes to be too empty • Use at least Non-leaf: (n+1)/2 pointers Leaf: (n+1)/2 pointers to data
n=3 Full node min. node Non-leaf Leaf 120 150 180 30 3 5 11 30 35 counts even if null
B+tree rules tree of order n (1) All leaves at same lowest level (balanced tree) (2) Pointers in leaves point to records except for “sequence pointer”
(3) Number of pointers/keys for B+tree Max Max Min Min ptrs keys ptrsdata keys Non-leaf (non-root) n+1 n (n+1)/2 (n+1)/2- 1 Leaf (non-root) n+1 n (n+1)/2 (n+1)/2 Root n+1 n 1 1
Insert into B+tree (read only) (a) simple case • space available in leaf (b) leaf overflow (c) non-leaf overflow (d) new root
32 n=3 100 (a) Insert key = 32 30 3 5 11 30 31
7 3 5 7 n=3 100 (a) Insert key = 7 30 3 5 11 30 31
Deletion from B+tree (a) Simple case - no example (b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf
40 n=4 (b) Coalesce with sibling • Delete 50 10 40 100 10 20 30 40 50
35 35 n=4 (c) Redistribute keys • Delete 50 10 40 100 10 20 30 35 40 50
B+tree deletions in practice • Often, coalescing is not implemented • Too hard and not worth it!