130 likes | 330 Views
Index Structures 13.2 – Secondary Index. Aditya Govindaraju - 218. 30. 20. 80. 100. 90. 50. 70. 40. 10. 60. Secondary indexes. Sequence field. 100. 30. 20. 80. 90. 10. 40. 60. 50. 70. 90. 30. 20. 80. 100. does not make sense!. Secondary indexes. Sequence field.
E N D
Index Structures13.2 – Secondary Index • Aditya Govindaraju - 218
30 20 80 100 90 50 70 40 10 60 Secondary indexes Sequence field
100 30 20 80 90 10 40 60 50 70 90 30 ... 20 80 100 does not make sense! Secondary indexes Sequence field • Sparse index
90 30 20 80 100 50 70 40 10 60 50 10 10 60 50 20 30 90 70 40 ... ... sparse high level Secondary indexes Sequence field • Dense index
Also: Pointers are record pointers (not block pointers; not computed) With secondary indexes: • Lowest level is dense • Other levels are sparse
20 20 10 10 30 10 40 40 40 40 Duplicate values & secondary indexes
30 20 20 10 10 10 40 40 40 40 40 10 20 10 30 40 10 40 ... 20 40 Duplicate values & secondary indexes one option... • Problem: • excess overhead! • disk space • search time
10 20 20 10 30 10 40 40 40 40 50 10 20 60 ... 30 40 Duplicate values & secondary indexes Another idea (suggested in class):Chain records with same key? • Problems: • Need to add fields to records • Need to follow chain to know records
Why “bucket” idea is useful Indexes Records Name: primary EMP (name,dept,floor,...) Dept: secondary Floor: secondary
Dept. index EMP Floor index Toy 2nd Query: Get employees in (Toy Dept) ^ (2nd floor) Intersect toy bucket and 2nd Floor bucket to get set of matching EMP’s
cat dog Inverted lists This idea used in text information retrieval Documents ...the cat is fat ... ...was raining cats and dogs... ...Fido the dog ...
Common technique: more info in inverted list position location type d1 cat Title 5 Author 10 Abstract 57 d2 d3 dog Title 100 Title 12
Posting: an entry in inverted list. Represents occurrence of term in article Size of a list: 1 Rare words or (in postings) miss-spellings 106 Common words Size of a posting: 10-15 bits (compressed)