1 / 79

CS4432: Database Systems II

Learn how indexing helps retrieve data quicker for specific queries. Sequential, Secondary Indexes, Dense, and Sparse Indexes are covered. The layout options on disk and practical considerations are also discussed.

brownsally
Download Presentation

CS4432: Database Systems II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS4432: Database Systems II Basic indexing

  2. Chapter 13 Indexing : helps to retrieve data quicker for certain queries value= 1,000,000 Select * FROM Emp WHERE salary = 1,000,000; record ? value

  3. Topics • Sequential Index Files (chap 13.1) • Secondary Indexes (chap 13.2)

  4. 10 30 50 70 90 20 40 60 80 100 Sequential File

  5. 10 70 50 30 110 90 80 60 40 20 120 100 90 10 70 50 30 80 60 40 20 100 Sequential File Dense Index Every record is in index.

  6. 10 130 90 50 210 170 150 110 70 30 230 190 90 10 70 50 30 80 60 40 20 100 Sequential File Sparse Index Only first record per block in index.

  7. Sequential File Sparse 2nd level 130 330 170 210 10 170 10 50 490 90 70 570 110 30 230 150 250 90 190 410 90 70 50 30 10 40 100 80 60 20

  8. Note : DATA FILE or INDEX are “ordered files”. Question: How would we lay them out on disk ? - contiguous layout on disk ? - block-chained layout on disk ?

  9. 10 490 10 210 170 130 90 330 170 50 250 30 70 570 410 150 190 110 230 90 30 50 10 90 70 100 80 60 40 20 Questions: • Do we want to build a dense 2nd-level index for a dense index? • Can we even do this ? 1st level? Sequential File 2nd level?

  10. Notes on pointers: • Block pointer (used in sparse index) can be smaller than record pointer (used in dense index) BP RP

  11. R2 R1 R3 R4 Note : If file is contiguous, then we can omit pointers K1 say: 1024 B per block K2 K3 K4 • if we want K3 block: • get it at offset (3-1)*1024 = 2048 bytes

  12. Sparse vs. Dense Tradeoff • Sparse: Less index space per record can keep more of index in memory (Later: sparse better for insertions) • Dense: Can tell if any record exists without accessing file (Later: dense needed for secondary indexes)

  13. Terms • Index sequential file • Search key ( primary key) • Primary index (on sequencing field) • Secondary index • Dense index (contains all search key values) • Sparse index • Multi-level index

  14. Next: • Duplicate keys • Deletion/Insertion • Secondary indexes

  15. 10 10 20 30 40 10 20 30 30 45 Duplicate keys

  16. 10 40 10 30 20 30 20 40 10 10 20 30 30 45 20 45 30 30 10 10 10 10 10 10 10 10 20 20 20 20 30 30 30 30 30 30 Duplicate keys Dense index ! Point to each value !

  17. 10 10 20 30 40 10 20 30 30 45 10 20 30 40 Duplicate keys Dense index. Point to each distinct value!

  18. 10 40 10 30 20 30 30 20 10 45 careful if looking for 20 or 30! Duplicate keys Sparse index: point to start of block ! 10 10 20 30

  19. 10 40 10 30 20 30 30 20 10 45 should this be 40? Duplicate keys Sparse index, another way ? • place first new key from block 10 20 30 30

  20. Duplicate values, primary index Summary • Index may point to first instance of each value only File Index a a a . . b

  21. Next: • Duplicate keys • Deletion/Insertion • Secondary indexes

  22. 70 10 30 50 20 40 60 80 Deletion from sparse index 10 30 50 70 90 110 130 150

  23. 70 10 30 50 20 40 60 80 Deletion from sparse index • delete record 40 10 30 50 70 90 110 130 150

  24. 10 30 50 70 20 40 60 80 40 40 Deletion from sparse index • delete record 30 10 30 50 70 90 110 130 150

  25. 10 30 50 70 20 40 60 80 50 70 Deletion from sparse index • delete records 30 & 40 10 30 50 70 90 110 130 150

  26. 10 30 50 70 80 20 40 60 Deletion from dense index 10 20 30 40 50 60 70 80 lecture #8

  27. 10 70 50 20 60 80 40 40 Deletion from dense index • delete record 30 10 20 30 30 40 40 50 60 70 80

  28. 10 30 40 60 20 50 Insertion, sparse index case 10 30 40 60

  29. 10 30 40 60 20 50 34 • our lucky day! • we have free space • where we need it! Insertion, sparse index case • insert record 34 10 30 40 60

  30. 10 30 40 60 20 50 15 20 20 30 Insertion, sparse index case • insert record 15 10 30 40 60 • Immediate reorganization • Other variations?

  31. Just Illustrated: • Immediate reorganization • Now Variation: • insert new block (chained file)

  32. 10 30 40 60 25 20 50 Insertion, sparse index case • insert record 25 10 30 overflow blocks (reorganize later...) 40 60

  33. Insertion, dense index case • Similar • Often more expensive . . .

  34. Next: • Duplicate keys • Deletion/Insertion • Secondary indexes

  35. 30 20 80 100 90 50 70 40 10 60 Secondary indexes Sequence field Can I make a secondary index sparse ?

  36. 100 30 20 80 90 10 40 60 50 70 90 30 ... 20 80 100 Secondary indexes Sequence field • Sparse index does not make sense!

  37. 90 30 20 80 100 50 70 40 10 60 50 10 10 60 50 20 30 90 70 40 ... ... sparse high level allowed? Secondary indexes Sequence field • Must be dense index !

  38. Also: Pointers are record pointers (not block pointers; not computed) With secondary indexes: • Lowest level is dense • Other levels are sparse

  39. 20 20 10 10 30 10 40 40 40 40 Duplicate values & secondary indexes

  40. 30 20 20 10 10 10 40 40 40 40 40 10 20 10 30 40 10 40 ... 20 40 Duplicate values & secondary indexes one option... • Problem: • excess overhead! • disk space • search time

  41. 10 20 20 10 30 40 40 40 10 40 20 30 40 Duplicate values & secondary indexes another option... 10 Problem: variable size records in index!

  42. 10 20 20 10 30 10 40 40 40 40 50 10 20 60 ... 30 40 Duplicate values & secondary indexes   Another idea :Chain records with same key !   • Problems: • Need to add fields to data records for each index • Need to follow chain to know records

  43. Summary : Conventional Indexes • Basic Ideas: sparse, dense, multi-level… • Duplicate Keys • Deletion/Insertion • Secondary indexes

  44. 90 30 20 80 100 50 70 40 10 60 10 50 10 50 20 60 90 70 30 40 ... ... high Level (always sparse) Multi-level Index Structures Sequence field 1 2 3 4 5 first level (dense, if non- sequential)

  45. Sequential indexes : pros/cons ? Advantage: - Simple - Index is sequential file good for scans - Search efficient for static data Disadvantage: - Inserts expensive, and/or - Lose sequentiality & balance - Then search time unpredictable

  46. 32 39 38 31 33 34 35 36 overflow area (not sequential) 10 20 30 Example Sequential Index continuous free space 40 50 60 70 80 90

  47. Another type of index • Give up “sequentiality” of index • Predictable performance under updates • Achieve always balance of “tree” • Automate restructuring under updates

  48. B+Tree Example n=3 100 Root 120 150 180 30 3 5 11 120 130 180 200 100 101 110 150 156 179 30 35

  49. Sample non-leaf 57 81 95 to keys to keys to keys to keys < 57 57 k<81 81k<95 95

  50. Sample leaf node: From non-leaf node to next leaf in sequence 57 81 95 To record with key 57 To record with key 81 To record with key 85

More Related