1 / 21

Index Tuning

Index Tuning. Conventional index. Secondary index. To speed up queries on attributes not within primary key Primary index Determine the placement of records in the data file Each table has only one primary index Secondary Only give the location of the records

lil
Download Presentation

Index Tuning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Index Tuning Conventional index

  2. Secondary index • To speed up queries on attributes not within primary key • Primary index • Determine the placement of records in the data file • Each table has only one primary index • Secondary • Only give the location of the records • One table may have multiple secondary index • Always dense

  3. 30 20 80 100 90 50 70 40 10 60 Secondary indexes Sequence field

  4. 100 30 20 80 90 10 40 60 50 70 90 30 ... 20 80 100 does not make sense! Secondary indexes Sequence field • Sparse index

  5. 90 30 20 80 100 50 70 40 10 60 50 10 10 60 50 20 30 90 70 40 ... ... sparse high level Secondary indexes Sequence field • Dense index

  6. Also: Pointers are record pointers (not block pointers; not computed) With secondary indexes: • Lowest level is dense • Other levels are sparse

  7. Application of secondary indexes in clustered file • Given relations • Movie(title, year, length, incolor, studioName, producerC#) • Studio(name, address, presC#) • Suppose the following query is typical • SELECT t i t l e , year FROM Movie, Studio WHERE presC# = zzz AND Movie.studioName = Studio.name; • Clustered file structure Secondary index on presC# can minimize disk I/Os!

  8. 20 20 10 10 30 10 40 40 40 40 Duplicate values & secondary indexes

  9. 30 20 20 10 10 10 40 40 40 40 40 10 20 10 30 40 10 40 ... 20 40 Duplicate values & secondary indexes one option... • Problem: • excess overhead! • disk space • search time

  10. 10 20 20 10 30 40 40 40 10 40 20 30 40 Duplicate values & secondary indexes another option... 10 Problem: variable size records in index!

  11. 10 20 20 10 30 10 40 40 40 40 50 10 20 60 ... 30 40 Duplicate values & secondary indexes   Another idea (suggested in class):Chain records with same key?   • Problems: • Need to add fields to records • Need to follow chain to know records

  12. 10 30 20 20 10 10 40 40 40 40 50 10 20 60 ... 30 40 Duplicate values & secondary indexes Using Indirection! buckets

  13. Why “bucket” idea is useful Indexes Records Name: primary EMP (name,dept,floor,...) Dept: secondary Floor: secondary We can use the pointers in the buckets to help answer queries without looking at most of records in the data file!

  14. Dept. index EMP Floor index Toy 2nd Query: Get employees in (Toy Dept) ^ (2nd floor)  Intersect toy bucket and 2nd Floor bucket to get set of matching EMP’s

  15. cat dog Inverted lists This idea used in text information retrieval Documents ...the cat is fat ... ...was raining cats and dogs... ...Fido the dog ...

  16. IR QUERIES • Find articles with “cat” and “dog” • Find articles with “cat” or “dog” • Find articles with “cat” and not “dog” • Find articles with “cat” in title • Find articles with “cat” and “dog” within 5 words

  17. Common technique: more info in inverted list cat position location type d1 Title 5 Author 10 Abstract 57 d2 d3 dog Title 100 Title 12

  18. Summary so far • Conventional index • Basic Ideas: sparse, dense, multi-level… • Duplicate Keys • Deletion/Insertion • Secondary indexes • Buckets of Postings List

  19. Conventional indexes Advantage: - Simple - Index is sequential file good for scans Disadvantage: - Inserts expensive, and/or - Lose sequentiality & balance

  20. 32 39 38 31 33 34 35 36 overflow area (not sequential) 10 Example Index (sequential) continuous free space 20 30 40 50 60 70 80 90

  21. summary

More Related