1 / 24

Mastering Hashing Techniques: Resolving Collisions and Improving Performance

Explore hashing algorithms, functions, collision resolution, memory usage, and record distributions in this comprehensive guide on optimizing data storage efficiency.

hawkers
Download Presentation

Mastering Hashing Techniques: Resolving Collisions and Improving Performance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 11. Hashing

  2. Contents • Introduction • A Simple Hashing Algorithm • Hashing Functions and Record Distributions • How Much Extra Memory Should Be Used? • Collision Resolution by Progressive Overflow • Storing More Than One Record per Address: Buckets • Making Deletions • Other Collision Resolution Techniques • Patterns of Record Access

  3. 1. Introduction • O-notation • O(1) • O(N) : sequential searching • O(log2N) • O(logkN) : B-Tree (k : 리프 노드 크기) • What is Hashing? • a = h(K) • h (hash function), K (key), a (home address) • Example K = BASSh = (first char * second char) mod 1000 • a = h(K) = (66 * 65) mod 1000 = 4,290 mod 1000 = 290

  4. Introduction • Collision • Examplekey : LOWELL => a = (76 * 79) mod 1000 = 6,004 mod 1000 = 4 OLIVIER => a = (79 * 76) mod 1000 = 6,004 mod 1000 = 4 • Several ways to reduce the number of collisions • 1. Spread out the records • Good hashing algorithms • 2. Use extra memory • 3. Put more than one record at a single address • Buckets

  5. 76 79 87 69 76 76 32 32 32 32 32 32 L O W E L L LOWELL = Blanks 2. A Simple Hashing Algorithm • 3 Steps • 1. Represent the key in numerical form • 2. Fold and add • 3. Divide by a prime number and use the remainder as the address • Example • Step 1. Represent the Key in Numerical Form

  6. A Simple Hashing Algorithm • Example (계속) • Step 2. Fold and Add 76 79 | 87 69 | 76 76 | 32 32 | 32 32 | 32 32 7679 + 8769 + 7676 + 3232 + 3232 = 30588 (30588+3232 = 33820 => 2byte Maximum 값 32767 을 초과하므로) 7679 + 8769 = 16448 => 16448 mod 19937 = 16448 16448 + 7676 = 24124 => 24124 mod 19937 = 4187 4187 + 3232 = 7419 => 7419 mod 19937 = 7419 7419 + 3232 = 10651 => 10651 mod 19937 = 10651 10651 + 3232 = 13883 => 13883 mod 19937 = 13883 • Step 3. Divide by the Size of the Address Spacea = s mod n (n : # of address in file)a = 13883 mod 100 = 83a = 13883 mod 101 = 46

  7. Best Worst Record Record Address Address 123 4 5 6 7 8 9 10 123 4 5 6 7 8 9 10 A B C D E F G A B C D E F G (a) (b) 3. Hashing Functions and Record Distributions • Distributing Records among Addresses Acceptable Record Address 123 4 5 6 7 8 9 10 A B C D E F G (c) <Figure 11.3> Different distributions. (a) Uniform distribution(Best) (b) Worst case (c) Randomly distribution (Acceptable)

  8. Hashing Functions and Record Distributions • Some Other Hashing Methods • Better than random • Examine keys for a pattern • 주민등록 번호 • Divide the key by a prime number • Random • Square the key and take the middle4532 => 2 0 5 2 0 9 • Radix transformation

  9. 4. How Much Extra Memory Should Be Used ? • Packing Density • Example r = 75 records N = 100 address

  10. How Much Extra Memory Should Be Used ? • Predicting Collisions for Different Packing Densities <Table 11.2> Effect of packing density on the proportion of records not stored at their home addresses

  11. 5. Collision Resolution by Progressive Overflow • Progressive Overflow • Open addressing • Linear probing 0 address3 York h(K) 1 2 Rosen Novak’s home address 3 Jasper York’s home address address2 Novak h(K) 4 York

  12. Collision Resolution by Progressive Overflow • Search Length 0 Adams 1 Bates 2 Cole 3 Dean 4 Evans 5

  13. Collision Resolution by Progressive Overflow • Search Length (계속) • Example <Figure 11.7>Average search lengthversus packing densityin a hashed file

  14. 0 1 2 3 4 Green Jenks King Nutt Hall Land Marks 6. Storing More Than One Record per Address : Buckets • Buckets

  15. Storing More Than One Record per Address : Buckets • Effects of Buckets on Performance r : # of recordsN : # of addressesb : # of records in a bucket

  16. Storing More Than One Record per Address : Buckets <Table 11.4> Synonyms causing collisions as a percent of records for different packing densities and different bucket sizes

  17. 0 Adams 1 Jones 2 Morris 3 Smith 7. Making Deletions • 처음상태

  18. 0 0 Adams Adams 1 1 Jones Jones 2 2 Morris ### 3 3 Smith Smith Making Deletions • (1) Tombstones for Handling Deletions * Deletion of Morris “Smith는 찾을 수 없다” ### : tombstoneThis mark indicates that a record once lived there but no longer does

  19. Making Deletions • (2) Implications of Tombstones for Insertions • Inserting “Smith” • (3) Effects of Deletions and Additions on Performance • Solution to problem of deteriorating average search length • Reorganization

  20. 8. Other Collision Resolution Techniques • (1) Double Hashing • Second hashing function • Increment(c) adding • Seek time overhead

  21. Other Collision Resolution Techniques • (2) Chained Progressive Overflow 0 Adams 1 Bates 2 Cole 3 Dean 4 Evans 5 Flint 0 Adams 2 1 Bates 3 2 Cole 5 3 Dean -1 4 Evans -1 5 Flint -1

  22. Other Collision Resolution Techniques • (3) Chaining with a Separate Overflow Area Homeaddress Primarydata area Overflowarea 0 Adams 0 Cole 2 1 Bates 1 Dean -1 2 Flint -1 3 4 Evans -1

  23. 0 1 2 3 4 Other Collision Resolution Techniques • (4) Scatter Tables: Indexing Revisited Adams 1 Coles 3 Bates 4 Flint -1 Deans -1 Evans -1

  24. Patterns of Record Access • A small percentage of the records in a file account for a large percentage of the accesses : 80 / 20 Rule80% of the accesses are performed on 20% of the records

More Related