1 / 102

CS 245: Database System Principles

CS 245: Database System Principles. Notes 5: Hashing and More. Hector Garcia-Molina. Hashing. key  h(key). <key>. Buckets (typically 1 disk block). Two alternatives. records. (1) key  h(key). Two alternatives. record. (2) key  h(key). key 1. Index. Two alternatives.

dferguson
Download Presentation

CS 245: Database System Principles

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 245: Database System Principles Notes 5: Hashing and More Hector Garcia-Molina Notes 5

  2. Hashing key  h(key) <key> Buckets (typically 1 disk block) . . . Notes 5

  3. Two alternatives . . . records (1) key  h(key) . . . Notes 5

  4. Two alternatives record (2) key  h(key) key 1 Index Notes 5

  5. Two alternatives record (2) key  h(key) key 1 Index • Alt (2) for “secondary” search key Notes 5

  6. Example hash function • Key = ‘x1 x2 … xn’ n byte character string • Have b buckets • h: add x1 + x2 + ….. xn • compute sum modulo b Notes 5

  7.  This may not be best function …  Read Knuth Vol. 3 if you really need to select a good function. Notes 5

  8.  This may not be best function …  Read Knuth Vol. 3 if you really need to select a good function. Good hash  Expected number of function: keys/bucket is the same for all buckets Notes 5

  9. Within a bucket: • Do we keep keys sorted? • Yes, if CPU time critical • & Inserts/Deletes not too frequent Notes 5

  10. Next: example to illustrate inserts, overflows, deletes h(K) Notes 5

  11. EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 0 1 2 3 Notes 5

  12. d a c b EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 0 1 2 3 h(e) = 1 Notes 5

  13. d a e c b EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 0 1 2 3 h(e) = 1 Notes 5

  14. EXAMPLE: deletion Delete:ef 0 1 2 3 a b d c e f g Notes 5

  15. maybe move “g” up EXAMPLE: deletion Delete:ef 0 1 2 3 a b d c c e f g Notes 5

  16. d maybe move “g” up EXAMPLE: deletion Delete:ef 0 1 2 3 a b d c c e f g Notes 5

  17. Rule of thumb: • Try to keep space utilization between 50% and 80% Utilization = # keys used total # keys that fit Notes 5

  18. If < 50%, wasting space • If > 80%, overflows significant depends on how good hash function is & on # keys/bucket Rule of thumb: • Try to keep space utilization between 50% and 80% Utilization = # keys used total # keys that fit Notes 5

  19. How do we cope with growth? • Overflows and reorganizations • Dynamic hashing Notes 5

  20. Extensible • Linear How do we cope with growth? • Overflows and reorganizations • Dynamic hashing Notes 5

  21. Extensible hashing: two ideas (a) Use i of b bits output by hash function b h(K)  use i grows over time…. 00110101 Notes 5

  22. (b) Use directory h(K)[i ] to bucket . . . . . . Notes 5

  23. 1 Example: h(k) is 4 bits; 2 keys/bucket 1 0001 i = 1 1001 1100 Insert 1010 Notes 5

  24. 1 1 1010 1100 Example: h(k) is 4 bits; 2 keys/bucket 1 0001 i = 1 1001 1100 Insert 1010 Notes 5

  25. i = 2 00 01 10 11 1 1 2 1010 New directory 2 1100 Example: h(k) is 4 bits; 2 keys/bucket 1 0001 i = 1 1001 1100 Insert 1010 Notes 5

  26. 2 2 Example continued i = 2 00 01 10 11 1 0001 1001 1010 Insert: 0111 0000 1100 Notes 5

  27. 0000 0001 0111 2 2 Example continued i = 2 00 01 10 11 1 0001 0111 1001 1010 Insert: 0111 0000 1100 Notes 5

  28. 2 0000 0001 2 0111 2 2 Example continued i = 2 00 01 10 11 1 0001 0111 1001 1010 Insert: 0111 0000 1100 Notes 5

  29. 1001 2 1010 1100 2 Example continued 0000 2 0001 i = 2 00 01 10 11 0111 2 Insert: 1001 Notes 5

  30. 1001 1001 2 1001 1010 1010 1100 2 Example continued 0000 2 0001 i = 2 00 01 10 11 0111 2 Insert: 1001 Notes 5

  31. i = 3 000 001 010 011 100 101 110 111 3 1001 1001 2 1001 1010 3 1010 1100 2 Example continued 0000 2 0001 i = 2 00 01 10 11 0111 2 Insert: 1001 Notes 5

  32. Extensible hashing: deletion • No merging of blocks • Merge blocks and cut directory if possible (Reverse insert procedure) Notes 5

  33. Deletion example: • Run thru insert example in reverse! Notes 5

  34. 1 2 2 Note: Still need overflow chains • Example: many records with duplicate keys if we split: insert 1100 1101 1100 1100 1100 Notes 5

  35. 1 1 Solution: overflow chains add overflow block: insert 1100 1100 1101 1101 1101 1100 Notes 5

  36. Summary Extensible hashing + Can handle growing files - with less wasted space - with no full reorganizations Notes 5

  37. Indirection (Not bad if directory in memory) Directory doubles in size (Now it fits, now it does not) - - Summary Extensible hashing + Can handle growing files - with less wasted space - with no full reorganizations Notes 5

  38. Two ideas: b (a) Use ilow order bits of hash 01110101 grows i Linear hashing • Another dynamic hashing scheme Notes 5

  39. Two ideas: b (a) Use ilow order bits of hash 01110101 grows i (b) File grows linearly Linear hashing • Another dynamic hashing scheme Notes 5

  40. Exampleb=4 bits, i =2, 2 keys/bucket 00 01 10 11 Future growth buckets 0000 0101 1010 1111 m = 01 (max used block) Notes 5

  41. If h(k)[i ]  m, then look at bucket h(k)[i ] else, look at bucket h(k)[i ]- 2i -1 Rule Exampleb=4 bits, i =2, 2 keys/bucket 00 01 10 11 Future growth buckets 0000 0101 1010 1111 m = 01 (max used block) Notes 5

  42. If h(k)[i ]  m, then look at bucket h(k)[i ] else, look at bucket h(k)[i ]- 2i -1 Rule Exampleb=4 bits, i =2, 2 keys/bucket • insert 0101 00 01 10 11 Future growth buckets 0000 0101 1010 1111 m = 01 (max used block) Notes 5

  43. 0101 • can have overflow chains! If h(k)[i ]  m, then look at bucket h(k)[i ] else, look at bucket h(k)[i ]- 2i -1 Rule Exampleb=4 bits, i =2, 2 keys/bucket • insert 0101 00 01 10 11 Future growth buckets 0000 0101 1010 1111 m = 01 (max used block) Notes 5

  44. Note • In textbook, n is used instead of m • n=m+1 n=10 Future growth buckets 0000 0101 1010 1111 00 01 10 11 m = 01 (max used block) Notes 5

  45. Exampleb=4 bits, i =2, 2 keys/bucket 00 01 10 11 Future growth buckets 0000 0101 1010 1111 m = 01 (max used block) Notes 5

  46. 1010 10 Exampleb=4 bits, i =2, 2 keys/bucket 00 01 10 11 Future growth buckets 0000 0101 1010 1111 m = 01 (max used block) Notes 5

  47. 0101 • insert 0101 1010 10 Exampleb=4 bits, i =2, 2 keys/bucket 00 01 10 11 Future growth buckets 0000 0101 1010 1111 m = 01 (max used block) Notes 5

  48. 0101 • insert 0101 1010 10 11 Exampleb=4 bits, i =2, 2 keys/bucket 00 01 10 11 Future growth buckets 0000 0101 1010 1111 m = 01 (max used block) Notes 5

  49. 0101 • insert 0101 1010 1111 0101 10 11 Exampleb=4 bits, i =2, 2 keys/bucket 00 01 10 11 Future growth buckets 0000 0101 1010 1111 m = 01 (max used block) Notes 5

  50. Example Continued:How to grow beyond this? i = 2 00 01 10 11 0000 0101 1010 1111 0101 . . . m = 11 (max used block) Notes 5

More Related