240 likes | 258 Views
Explore hashing algorithms, functions, collision resolution, memory usage, and record distributions in this comprehensive guide on optimizing data storage efficiency.
E N D
Contents • Introduction • A Simple Hashing Algorithm • Hashing Functions and Record Distributions • How Much Extra Memory Should Be Used? • Collision Resolution by Progressive Overflow • Storing More Than One Record per Address: Buckets • Making Deletions • Other Collision Resolution Techniques • Patterns of Record Access
1. Introduction • O-notation • O(1) • O(N) : sequential searching • O(log2N) • O(logkN) : B-Tree (k : 리프 노드 크기) • What is Hashing? • a = h(K) • h (hash function), K (key), a (home address) • Example K = BASSh = (first char * second char) mod 1000 • a = h(K) = (66 * 65) mod 1000 = 4,290 mod 1000 = 290
Introduction • Collision • Examplekey : LOWELL => a = (76 * 79) mod 1000 = 6,004 mod 1000 = 4 OLIVIER => a = (79 * 76) mod 1000 = 6,004 mod 1000 = 4 • Several ways to reduce the number of collisions • 1. Spread out the records • Good hashing algorithms • 2. Use extra memory • 3. Put more than one record at a single address • Buckets
76 79 87 69 76 76 32 32 32 32 32 32 L O W E L L LOWELL = Blanks 2. A Simple Hashing Algorithm • 3 Steps • 1. Represent the key in numerical form • 2. Fold and add • 3. Divide by a prime number and use the remainder as the address • Example • Step 1. Represent the Key in Numerical Form
A Simple Hashing Algorithm • Example (계속) • Step 2. Fold and Add 76 79 | 87 69 | 76 76 | 32 32 | 32 32 | 32 32 7679 + 8769 + 7676 + 3232 + 3232 = 30588 (30588+3232 = 33820 => 2byte Maximum 값 32767 을 초과하므로) 7679 + 8769 = 16448 => 16448 mod 19937 = 16448 16448 + 7676 = 24124 => 24124 mod 19937 = 4187 4187 + 3232 = 7419 => 7419 mod 19937 = 7419 7419 + 3232 = 10651 => 10651 mod 19937 = 10651 10651 + 3232 = 13883 => 13883 mod 19937 = 13883 • Step 3. Divide by the Size of the Address Spacea = s mod n (n : # of address in file)a = 13883 mod 100 = 83a = 13883 mod 101 = 46
Best Worst Record Record Address Address 123 4 5 6 7 8 9 10 123 4 5 6 7 8 9 10 A B C D E F G A B C D E F G (a) (b) 3. Hashing Functions and Record Distributions • Distributing Records among Addresses Acceptable Record Address 123 4 5 6 7 8 9 10 A B C D E F G (c) <Figure 11.3> Different distributions. (a) Uniform distribution(Best) (b) Worst case (c) Randomly distribution (Acceptable)
Hashing Functions and Record Distributions • Some Other Hashing Methods • Better than random • Examine keys for a pattern • 주민등록 번호 • Divide the key by a prime number • Random • Square the key and take the middle4532 => 2 0 5 2 0 9 • Radix transformation
4. How Much Extra Memory Should Be Used ? • Packing Density • Example r = 75 records N = 100 address
How Much Extra Memory Should Be Used ? • Predicting Collisions for Different Packing Densities <Table 11.2> Effect of packing density on the proportion of records not stored at their home addresses
5. Collision Resolution by Progressive Overflow • Progressive Overflow • Open addressing • Linear probing 0 address3 York h(K) 1 2 Rosen Novak’s home address 3 Jasper York’s home address address2 Novak h(K) 4 York
Collision Resolution by Progressive Overflow • Search Length 0 Adams 1 Bates 2 Cole 3 Dean 4 Evans 5
Collision Resolution by Progressive Overflow • Search Length (계속) • Example <Figure 11.7>Average search lengthversus packing densityin a hashed file
0 1 2 3 4 Green Jenks King Nutt Hall Land Marks 6. Storing More Than One Record per Address : Buckets • Buckets
Storing More Than One Record per Address : Buckets • Effects of Buckets on Performance r : # of recordsN : # of addressesb : # of records in a bucket
Storing More Than One Record per Address : Buckets <Table 11.4> Synonyms causing collisions as a percent of records for different packing densities and different bucket sizes
0 Adams 1 Jones 2 Morris 3 Smith 7. Making Deletions • 처음상태
0 0 Adams Adams 1 1 Jones Jones 2 2 Morris ### 3 3 Smith Smith Making Deletions • (1) Tombstones for Handling Deletions * Deletion of Morris “Smith는 찾을 수 없다” ### : tombstoneThis mark indicates that a record once lived there but no longer does
Making Deletions • (2) Implications of Tombstones for Insertions • Inserting “Smith” • (3) Effects of Deletions and Additions on Performance • Solution to problem of deteriorating average search length • Reorganization
8. Other Collision Resolution Techniques • (1) Double Hashing • Second hashing function • Increment(c) adding • Seek time overhead
Other Collision Resolution Techniques • (2) Chained Progressive Overflow 0 Adams 1 Bates 2 Cole 3 Dean 4 Evans 5 Flint 0 Adams 2 1 Bates 3 2 Cole 5 3 Dean -1 4 Evans -1 5 Flint -1
Other Collision Resolution Techniques • (3) Chaining with a Separate Overflow Area Homeaddress Primarydata area Overflowarea 0 Adams 0 Cole 2 1 Bates 1 Dean -1 2 Flint -1 3 4 Evans -1
0 1 2 3 4 Other Collision Resolution Techniques • (4) Scatter Tables: Indexing Revisited Adams 1 Coles 3 Bates 4 Flint -1 Deans -1 Evans -1
Patterns of Record Access • A small percentage of the records in a file account for a large percentage of the accesses : 80 / 20 Rule80% of the accesses are performed on 20% of the records