330 likes | 437 Views
Hashing, Hashing Tables. Chapter 8. Class Hierarchy. Introduction. Definition: Key: a key is a field or composite of fields that uniquely identifies an entry in a table. Example. Table of students in a course sorted by name --------------------------------------------------------------
E N D
Hashing, Hashing Tables Chapter 8
Introduction • Definition: • Key: a key is a field or composite of fields that uniquely identifies an entry in a table.
Example • Table of students in a course sorted by name -------------------------------------------------------------- Name Year Mark -------------------------------------------------------------- Adams, Keith 3 94 Davis, Susan 1 75 Jordan, Ann 1 86 Patterson, Lynn 4 73 Williams, George 1 65
Hashing • The implementation of hash tables is called Hashing. • Hashing is a technique used for performing insertions and finds in constant average time. • Efficient removal of items not required
The General Idea • Array of some fixed size, containing items.
Keys and Hash Functions • Each key is mapped into some number in the range 0 to TableSize-1 and placed in the appropriate cell. • The mapping is called a hash function
Keys and Hash Functions • Characteristics of a good hash function • Avoids collisions • Spread keys evenly in the array • Easy to compute
Avoid Collisions • Ideal situation • Given a set of n<=M distinct keys {k1,k2,…,kn}, the set of hash values {h(k1),h(k2),…,h(kn)} contains no duplicates • We can only try to reduce the likelihood of a collision using knowledge about the keys • E.g. if we know the telephone numbers are all from the same district, so the district number will have little use in our hash function
Spreading Keys Evenly • We need to know the distribution of the keys • An equal number of keys should map into each array position
Ease of Computation • The running time of the hash function should be O(1) (Jumping immediately to the desired record is a direct access approach, much like direct access of data on a disk)
Hashing Methods • We are dealing with integer values first, K=Z • The value of the hash function falls between 0 and M-1
Division Method • The simplest method of hashing an integer • The division method of hashing h(x) = x mod M.
Choice of M • Generally, any M is good • we often choose M to be a prime number
Implementation Unsigned int const M = 1031; // a prime Unsigned int h(unsigned int x) { return x%M; }
Middle Square Method • Avoid division • Making use of the fact that computer does finite-precision integer arithmetic • All arithmetic is done modulo W, where W=2w, w is the word size of the computer • M=2k, W=2w • Meaning: • Multiply x by itself, then shift to the right k bits.
Implementation • unsigned int const k = 10; // M==1024 • unsigned int const w = bitsizeof (unsigned int); • unsigned int h (unsigned int x) • { return (x * x) >> (w - k); }
Multiplication Method • We multiply the key by a
Implementation unsigned int const k = 10; // M==1024 unsigned int const w = bitsizeof (unsigned int); unsigned int const a = 2654435769U; unsigned int h (unsigned int x) { return (x * a) >> (w - k); } }