1 / 31

CS4432: Database Systems II

CS4432: Database Systems II. Index definition in SQL. Create index name on rel (attr) (Check online for index definitions in SQL). Drop INDEX name. Note. ATTRIBUTE LIST  MULTIKEY INDEX e.g., CREATE INDEX foo ON R(A,B,C). Multi-key Index. Motivation: Find records where

darva
Download Presentation

CS4432: Database Systems II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS4432: Database Systems II

  2. Index definition in SQL • Createindex name on rel (attr) (Check online for index definitions in SQL) • Drop INDEX name

  3. Note ATTRIBUTE LIST MULTIKEY INDEX e.g., CREATEINDEX foo ON R(A,B,C)

  4. Multi-key Index Motivation: Find records where DEPT = “Toy” AND SAL > 50k

  5. Strategy I: • Use one index, say Dept. • Get all Dept = “Toy” records and check their salary I1

  6. Strategy II: • Use 2 Indexes; Manipulate Pointers Toy Sal > 50k

  7. Strategy III: • Multiple Key Index One idea: I2 I3 I1

  8. Art Sales Toy Example 10k 15k Example Record Dept Index Salary Index 17k 21k Name=Joe DEPT=Sales SAL=15k 12k 15k 15k 19k

  9. For which queries is this index good? Find RECs Dept = “Sales” SAL=20k Find RECs Dept = “Sales” SAL > 20k Find RECs Dept = “Sales” Find RECs SAL = 20k

  10. Many alternate methods for indexing

  11. Hashing key  h(key) <key> Buckets (typically 1 disk block) . . .

  12. One example hash function • Key = ‘x1 x2 … xn’ n-byte character string • Have b buckets • Hash function : • h: add (x1 + x2 + ….. Xn) modulo b

  13.  This may not be best function …  Read Knuth Vol. 3 if you really need to select a good function. Good hash  Expected number of function: keys/bucket is the same for all buckets

  14. Within a bucket: • Do we keep keys sorted? • Yes, if CPU time critical • & Inserts/Deletes not too frequent

  15. Next: example to illustrate inserts, overflows, deletes h(K)

  16. d a e c b EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 0 1 2 3 h(e) = 1

  17. d maybe move “g” up EXAMPLE: deletion Delete:ef 0 1 2 3 a b d c c e f g

  18. If < 50%, wasting space • If > 80%, overflows significant depends on how good hash function is & on # keys/bucket Rule of thumb: • Try to keep space utilization between 50% and 80% Utilization = # keys used total # keys that fit

  19. Extensible hashing • Others … How do we cope with growth? • Overflows and reorganizations • Dynamic hashing

  20. Extensible hashing : idea 1 (a) Use i of b bits output by hash function b h(K)  use i grows over time…. Note: enables future doubling of space ! 00110101

  21. Extensible hashing : idea 2 (b) Hash to directory of pointers to buckets (instead of buckets directly) h(K)[i ] to bucket Note : Double space by doubling the directory ! . . . . . .

  22. i = 2 00 01 10 11 1 1 2 1010 New directory 2 1100 Example: h(k) is 4 bits; 2 keys/bucket 1 0001 i = 1 0 1 1001 1100 Insert 1010

  23. 2 0000 0001 2 0111 2 2 Example continued i = 2 00 01 10 11 1 0001 0111 1001 1010 Insert: 0111 0000 1100

  24. i = 3 000 001 010 011 100 101 110 111 3 1001 1001 2 1001 1010 3 1010 1100 2 Example continued 0000 2 0001 i = 2 00 01 10 11 0111 2 Insert: 1001

  25. Extensible hashing: deletion • Merge blocks and cut directory if possible (Reverse insert procedure)

  26. Indirection (Not bad if directory in memory) Directory doubles in size (Now it fits, now it does not) - - Summary Extensible hashing + If directory fits into main memory, then access cost is 1 IO, otherwise 2 IOs Can handle growing files - with less wasted space - with no full reorganizations +

  27. Use what when : • Indexing : Tree-Structures vs Hashing

  28. Indexing vs Hashing • Hashing good for probes given key e.g., SELECT … FROM R WHERE R.A = 5

  29. Indexing vs Hashing • INDEXING (Including B Trees) good for Range Searches: e.g., SELECT FROM R WHERE R.A > 5

  30. Reading Chapter 14 • Read • 14.3.1 and 14.3.2

  31. The BIG picture…. • Chapters 11 & 12: Storage, records, blocks... • Chapter 13 & 14: Access Mechanisms - Indexes - B trees - Hashing - Multi key • Chapter 15 & 16: Query Processing NEXT

More Related