150 likes | 174 Views
Hashing. Hashing. Many applications require I NSERT , S EARCH and D ELETE functions Hashing on average time can do all of these in O (1) Based on keys Falls under two general categories: Direct-Address Tables Hash Tables. Direct-Addressing. Good for when universe U of keys is small
E N D
Hashing Jeff Chastine
Hashing • Many applications require INSERT, SEARCH and DELETE functions • Hashing on average time can do all of these in O (1) • Based on keys • Falls under two general categories: • Direct-Address Tables • Hash Tables Jeff Chastine
Direct-Addressing • Good for when universe U of keys is small • U = {0, 1, …, m – 1 | m is not large} • All elements have unique keys • Table T [0..m -1] | each slot corresponds to a key • All operations take only O (1) Jeff Chastine
Direct Implementation 0 key satellite data 1 U (universe of keys) 2 2 0 3 3 6 9 7 4 4 1 2 5 5 K (actual keys) 3 6 5 7 8 8 8 9 Jeff Chastine
Direct-Addressing Operations DIRECT-ADDRESS-SEARCH (T, k) return T[k] DIRECT-ADDRESS-INSERT (T, x) T[key[x]] ←x DIRECT-ADDRESS-DELETE (T, x) T[key[x]] ←NIL Jeff Chastine
Hash Tables • What are potential problems with direct addressing? • |U| may be impractical • Set of actual keys may be small • Example SSNs • Here, hash tables require much less storage • Only catch: O (1) is average time instead of worst-case ! Jeff Chastine
How it works • With direct-addressing, something with key k goes into slot k • With hashing it goes into h (k) | h is a hash function • Hash functions try to “randomize” • Hash function maps U to T [0..m – 1] h :U→ {0, 1, …, m – 1} • Instead of |U| values,need only m values Jeff Chastine
Hash Implementation T 0 U (universe of keys) h (k1) h (k4) k1 h (k2)= h (k5) K (actual keys) k5 k4 k2 k3 h (k3) m - 1 Jeff Chastine
Collisions • Have two keys hash to the same slot • Because |U| > m, pigeon hole principle • Therefore, collisions must exist • We often talk of the load factor (α = n/m) • Pick a good hash function • Near random, yet deterministic • Can chain collisions together • This is where the worst-case comes from • Can use open addressing Jeff Chastine
Chaining T U (universe of keys) k1 k7 k4 k7 k1 k5 k2 K (actual keys) k5 k4 k2 k3 k3 Jeff Chastine
Hash Functions • What makes a good hash function? • Equally likely to hash to any of the m slots • If keys are random numbers [0 … 1} then take floor of km • Convert strings to ASCII to hash? • Most usually involve mod Jeff Chastine
Hash Functions • Division method: h (k ) = k mod m • Multiplication method: Let 0 < A < 1 h (k ) = floor(m (k A mod 1) ) // Fractional part Jeff Chastine
Open Addressing • Systematically examine or probe slots until item is found • No lists and no elements stored outside the table; thus α <= 1 • Instead of following pointers, we compute the sequence • Instead of fixed order – is based off of key Jeff Chastine
Kinds of Open Addressing • Linear Probing h (k, i ) = (h’ (k ) + i ) mod m • Quadratic Probing h (k, i ) = (h’ (k ) +c1i + c2i 2) mod m • Double Hashing h (k, i ) = (h1(k ) + i h2(k )) mod m Jeff Chastine