120 likes | 310 Views
Introduction to Hashing - Hash Functions. Sections 5.1, 5.2, and 5.6. Hashing. Data items stored in an array of some fixed size Hash table Search performed using some part of the data item key Used for performing insertions, deletions, and finds in constant average time
E N D
Introduction to Hashing - Hash Functions Sections 5.1, 5.2, and 5.6
Hashing • Data items stored in an array of some fixed size • Hash table • Search performed using some part of the data item • key • Used for performing insertions, deletions, and finds in constant average time • Operations requiring ordering information not supported efficiently • Such as findMin, findMax
Hash Table Applications • Comparing search efficiency of different data structures: • Vector, list: O(N) • AVL search tree: O(log(N)) • Hash table: O(1) expected time • Compilers to keep track of declared variables • Symbol tables • Mapping from name to id • On-line spelling checkers
Hash Functions • Map keys to integers (which represent table indices) • Hash(Key) = Integer • Evenly distributed index values • Even if the input data is not evenly distributed • What happens if multiple keys mapped to the same integer (same position)? • Collision management (discussed in detail later) • Collisions are likely to be reduced if keys are evenly distributed over the hash table
Simple Hash Functions • Assumptions: • K: an unsigned 32-bit integer • M: the number of buckets (the number of entries in a hash table) • Goal: • If a bit is changed in K, all bits are equally likely to change for Hash(K) • So that items are evenly distributed in the hash table
A Simple Function • What if • Hash(K) = K % M • Where M is of any integer value • What is wrong? • Values of K may not be evenly distributed • But Hash(K) needs to be evenly distributed • Suppose • M = 10, • K = 10, 20, 30, 40 • Then K % M = 0, 0, 0, 0, 0…
Another Simple Function • If • Hash(K) = K % P, P = prime number • Suppose • P = 11 • K = 10, 20, 30, 40 • K % P = 10, 9, 8, 7 • More uniform distribution… • So hash tables often have prime number of entries
A Simple Hash for Strings unsigned int Hash(const string& Key) { unsigned int hash = 0; for (int j = 0; j != Key.size(); ++j) { hash += Key[j] } return hash; } Problem: Small sized keys may not use a large fraction of a large hash table 9
Another Simple Hash Function unsigned int Hash(const string& Key) { return Key[0] + 27*Key[1] + 729*Key[2]; } • Problem: English does not use random strings; so, the hash values are not uniformly distributed • Using more characters of the key can improve the hash function
A Better Hash Function unsigned int Hash(const string &Key) { unsigned int hash = 0; for (int j = 0; j != Key.size(); ++j) hash = 37*hash + (Key[j]-’a’+1); return hash%TableSize; } • The for loop computes ai37n-i using Horner’s rule, where ai has the value 1 for ‘a’, 2 for ‘b’, etc • a3 + 37a2 + 372a1 + 373a0 = 37(37(37a0 + a1)+ a2) + a3 • The for implicitly performs arithmetic modulo 2k, where k is the number of bits in an unisigned int
STL Hash Tables • STL extensions • hash_set • hash_map • The key type, hash function, and equality operator may need to be provided • Available in new standard as unordered set and map • <tr1/unordered_map> or <unordered_map> • <trl/unordered_set> or <unordered_set> • Example: Lec24/hashmapex.cpp • Reference • www.open-std.org/jtc1/sc22/wg21/docs/papers/2003/n1456.html