500 likes | 1.18k Views
Hashing. Hashing is another method for sorting and searching data. Hashing makes it easier to add and remove elements from a data structure. The worst-case behavior for locating a key is linear – Q (n). Java’s standard hash table class is: java.util.Hashtable. Hashing.
E N D
Hashing • Hashing is another method for sorting and searching data. • Hashing makes it easier to add and remove elements from a data structure. • The worst-case behavior for locating a key is linear – Q(n). • Java’s standard hash table class is: java.util.Hashtable
Hashing • Hashing usually implements a data structure called a hash table. • A hash table is an effective data structure. • A hash table is a generalization of an array. • A hash table requires a key to access data.
Hashing • A hash table uses an array whose length is proportional to the number of keys actually stored. • The array index is computed from the key, rather than using the key to access the array. • The key is a unique identifying value.
Hashing Functions • Hashing requires the use of a hashing function. • The purpose of the hashing function is to compute the storage slot from the key. • Maps key values to array indices. • This calculation reduces the range of array indices that need to be handled.
Hashing Functions • If a hashing function groups key values together, this is called clustering of the keys. • A good hashing function distributes the key values uniformly through the array’s index range. • Any hashing function that results in clustering should be changed. • A good hashing function has an equal likelihood of hashing a key into any of the slots. • The java.util.Hashtable contains the method hashCode
Hashing Functions • The division hash function depends upon the remainder of division. • Math.abs(H(k)) % table.length • When using the division hash function, it is best to have a table size that is a prime number of the form 4n + 3. • Using the division hash function can result in many collisions.
Hashing Functions • The mid-square hash function converts the key to an integer, then doubles the key. The function returns the middle digits of the results. • The multiplicative hash function converts the key to an integer and multiplies it by a constant less than one. The function returns the first few digits of the fractional part of the result.
Example Table 0 H(k1) Universe of Keys - U H(k4) K1 Actual Keys – K H(k2) K4 K5 K2 K3 H(k3) m - 1
Collisions • A collision occurs when the hashing function calculates the same array index for two different objects and one is already stored into the array index location. • Two keys hash to the same slot.
Collision Example Table 0 H(k1) Universe of Keys - U H(k4) K1 Actual Keys – K H(k2) = H(k5) K4 K5 K2 K3 H(k3) m - 1
Open Addressing • Open addressing ensures that all elements are stored directly into the hash table. • Every table slot contains either data or null. • The problem is that the table can fill up. • The good thing is that there are no external storage locations for the table elements.
Open Addressing • Open addressing attempts to resolve collisions using various methods.
Linear Probing • Linear Probing resolves collisions by placing the data into the next open slot in the table. • If this slot is open, the data is stored in the slot. • If this slot is not open, the algorithm looks at the next slot (index) until an open slot is found.
Linear Probing • It is difficult to delete items from a hash table that uses open addressing. • Can not simply put null into the slot because may miss information. Instead place Deleted into the empty slot. • If H’(k) is the ordinary hash function, the linear probing hash function is: • H(k, i) = (H’(k) + 1) % m where i = 0, 1, 2, … , m and m is the number of elements that can be stored into the table.
Linear Probing • A problem associated with Linear Probing is called, primary clustering. • Primary clustering occurs when many items hash into the same slot and long runs of slots are filled up. • This results in increased search times.
Linear Probing Table 0 H(k1) Universe of Keys - U H(k4) K1 Actual Keys – K H(k2) = H(k5) K4 K5 H(k5) K2 K3 H(k3) m - 1
Double Hashing • Double hashing is one of the best methods for dealing with collisions. • The slot location is calculated based upon the hash function (H1(k)). If the slot is full, then a second hash function is calculated and combined with the first hash function (H(k, i)) to determine a new slot.
Double Hashing • Assume that: • H1(k) =Math.abs(H(k)) % table.length • H2(k) = 1 + Math.abs(H(k)) % (table.length – x) where x is a small value; 1, 2, or 3. • Then: • H(k, i) = (H1(k) + i H2(k) ) % m
Double Hashing Table 0 H(k5) H(k1) Universe of Keys - U H(k4) K1 Actual Keys – K H(k2) = H(k5) K4 K5 K2 K3 H(k3) m - 1
External Chaining • In external chaining the hash table contains an array in which each component can hold more than one element of the hash table. • Essentially, a multiple dimension array or a linked list of elements can exist for each table slot. • The typical implementation is that each slot contains a linked list.
External Chaining Table 0 H(k1) Universe of Keys - U H(k4) K1 Actual Keys – K H(k2) H(k5) K4 K5 K2 K3 H(k3) m - 1
Load Factor • The load factor is a fraction that represents the number of elements stored in the table divided by the size of the table’s array. • a = the number of elements stored in the table the size of the table’s array
Load Factor • If open addressing is used, then each table slot holds at most one element, therefore, the load factor can never be greater than 1. • If external chaining is used, then each table slot can hold many elements, therefore, the load factor may be greater than 1.
Hashing Analysis • The worst case analysis for hashing is the case where every key is hashed into the same slot. • Q (n) – linear time. • The average time can be much faster.
Average Search Analysis • Searching with Linear probing. • For a table that is not near full: • ½ ( 1 + 1 / (1 – a) ) • For a table that is full or near full: • Math.Sqrt( n ( p / 8) ) • Searching with double hashing. • (-ln (1 –a) ) / a where ‘l’ in ‘ln’ is ‘L’ • Searching with chained hashing. • 1 + (a / 2 ) • See Figure 11.6 in Main. Page 561
Coding Example • Search Times program that demonstrates Linear, Binary, and Hashing. • The hashing uses the HashTable class.
Hashing • Java provides the HashTable class, but it also provides two other classes. • The HashMap class implements a hash table using a map data structure. • The HashSet class implements a hash table using sets.