270 likes | 561 Views
Data Structures and Algorithms Hashing First Year. M. B. Fayek CUFE 2010. Hashing. What is Hashing? Problems in hashing Collision Resolution Strategies. 1. What is Hashing?. Hashing is a quick and efficient searching technique .
E N D
Data Structures and Algorithms HashingFirst Year M. B. Fayek CUFE 2010
Hashing • What is Hashing? • Problems in hashing • Collision Resolution Strategies
1. What is Hashing? • Hashing is a quick and efficient searching technique. • So far, efficiency of search depended on the number of comparisons • In hashing the keys themselves point directly to records by applying a hashing function. • All possible key values are mapped into in the hash table. • The hashing function is used for search as well as for storing.
1. What is Hashing? • The hash table is sequential and contiguous. • Each slot is called a bucket. • Buckets may hold more than one key.
1. What is Hashing? • Hashing methods: • Direct and Subtraction • Modulo-division (or division remainder) using list size ( prime, why?) • Digit extraction • Midsquare • Folding ( fold shift, fold boundary) • Pseudo random ( seed)
Hashing • What is Hashing? • Problems in hashing • Collision Resolution Strategies
Problems in Hashing • Collision occurs whenever a hash function maps two distinct keys to the same bucket. • The hashing function must generate bucket addresses quickly and efficiently, with minimum collisions. • As the domain of keys is usually larger than the number of buckets collisions are very likely to happen no matter how efficient the hashing function is.
Hashing • What is Hashing? • Problems in hashing • Collision Resolution Strategies
3. Collision Resolution Strategies • Definitions: • Load factor = list size/num of elements in list • Clustering ( primary, secondary)
3. Collision Resolution Strategies • Open Addressing: (using prime area) • Probing (Linear, quadratic) • Double Hashing • Pseudo-random • Key offset • Linked Lists (Separate Chaining) • (Bucket Hashing) • Re-hashing
3. Collision Resolution Strategies • Open Addressing: • Probing: • Linear Probing: Search at constant intervals from collision (typically 1) • Quadratic Probing: Search at quad-ratically increasing intervals, i.e. collision function f(i) = i2 ; i.e. on collision searching 1st, 4th, 9th, … location
3. Collision Resolution Strategies • Open Addressing: (using prime area) • Probing (Linear, quadratic) • Double Hashing • Pseudo-random • Key offset • Linked Lists (Separate Chaining) • (Bucket Hashing) • Re-hashing
3. Collision Resolution Strategies • Open Addressing • Double Hashing: Apply a second hashing function and probe at the obtained address: hash2(x), 2* hash2(x), 3* hash2(x), . . .
3. Collision Resolution Strategies • Open Addressing: (using prime area) • Probing (Linear, quadratic) • Double Hashing • Pseudo-random • Key offset • Linked Lists (Separate Chaining) • (Bucket Hashing) • Re-hashing
3. Collision Resolution Strategies • Linked lists (Separate Chaining): • Separate chaining ( may be modified by keeping the chain sorted!) • Modified Hash Table (by eliminating the first probe, hence the hash table becomes an array of records instead of an array of pointers to records)
3. Collision Resolution Strategies • Open Addressing: (using prime area) • Probing (Linear, quadratic) • Double Hashing • Pseudo-random • Key offset • Linked Lists (Separate Chaining) • (Bucket Hashing) • Re-hashing
successful search unsuccessful search 3. Collision Resolution Strategies • Rehashing: • When table becomes too full, operations will start taking too long • Solution: Build another hashing table of about double size + associated hashing function and scan down entire original hash table
3. Collision Resolution Strategies • Rehashing: • When is the table too full ? • Rehash when table is half full • Rehash when an insertion fails • When table reaches a certain load factor . . . . . best
Probing • Definition: Each calculation of an address and test for success is known as probing
Key offset collision resolution • Offset = key/list size • Address= (Offset + old address) % list size