1 / 27

Hashing

Hashing. Hashing is another method for sorting and searching data. Hashing makes it easier to add and remove elements from a data structure. The worst-case behavior for locating a key is linear – Q (n). Java’s standard hash table class is: java.util.Hashtable. Hashing.

Download Presentation

Hashing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hashing • Hashing is another method for sorting and searching data. • Hashing makes it easier to add and remove elements from a data structure. • The worst-case behavior for locating a key is linear – Q(n). • Java’s standard hash table class is: java.util.Hashtable

  2. Hashing • Hashing usually implements a data structure called a hash table. • A hash table is an effective data structure. • A hash table is a generalization of an array. • A hash table requires a key to access data.

  3. Hashing • A hash table uses an array whose length is proportional to the number of keys actually stored. • The array index is computed from the key, rather than using the key to access the array. • The key is a unique identifying value.

  4. Hashing Functions • Hashing requires the use of a hashing function. • The purpose of the hashing function is to compute the storage slot from the key. • Maps key values to array indices. • This calculation reduces the range of array indices that need to be handled.

  5. Hashing Functions • If a hashing function groups key values together, this is called clustering of the keys. • A good hashing function distributes the key values uniformly through the array’s index range. • Any hashing function that results in clustering should be changed. • A good hashing function has an equal likelihood of hashing a key into any of the slots. • The java.util.Hashtable contains the method hashCode

  6. Hashing Functions • The division hash function depends upon the remainder of division. • Math.abs(H(k)) % table.length • When using the division hash function, it is best to have a table size that is a prime number of the form 4n + 3. • Using the division hash function can result in many collisions.

  7. Hashing Functions • The mid-square hash function converts the key to an integer, then doubles the key. The function returns the middle digits of the results. • The multiplicative hash function converts the key to an integer and multiplies it by a constant less than one. The function returns the first few digits of the fractional part of the result.

  8. Example Table 0 H(k1) Universe of Keys - U H(k4) K1 Actual Keys – K H(k2) K4 K5 K2 K3 H(k3) m - 1

  9. Collisions • A collision occurs when the hashing function calculates the same array index for two different objects and one is already stored into the array index location. • Two keys hash to the same slot.

  10. Collision Example Table 0 H(k1) Universe of Keys - U H(k4) K1 Actual Keys – K H(k2) = H(k5) K4 K5 K2 K3 H(k3) m - 1

  11. Open Addressing • Open addressing ensures that all elements are stored directly into the hash table. • Every table slot contains either data or null. • The problem is that the table can fill up. • The good thing is that there are no external storage locations for the table elements.

  12. Open Addressing • Open addressing attempts to resolve collisions using various methods.

  13. Linear Probing • Linear Probing resolves collisions by placing the data into the next open slot in the table. • If this slot is open, the data is stored in the slot. • If this slot is not open, the algorithm looks at the next slot (index) until an open slot is found.

  14. Linear Probing • It is difficult to delete items from a hash table that uses open addressing. • Can not simply put null into the slot because may miss information. Instead place Deleted into the empty slot. • If H’(k) is the ordinary hash function, the linear probing hash function is: • H(k, i) = (H’(k) + 1) % m where i = 0, 1, 2, … , m and m is the number of elements that can be stored into the table.

  15. Linear Probing • A problem associated with Linear Probing is called, primary clustering. • Primary clustering occurs when many items hash into the same slot and long runs of slots are filled up. • This results in increased search times.

  16. Linear Probing Table 0 H(k1) Universe of Keys - U H(k4) K1 Actual Keys – K H(k2) = H(k5) K4 K5 H(k5) K2 K3 H(k3) m - 1

  17. Double Hashing • Double hashing is one of the best methods for dealing with collisions. • The slot location is calculated based upon the hash function (H1(k)). If the slot is full, then a second hash function is calculated and combined with the first hash function (H(k, i)) to determine a new slot.

  18. Double Hashing • Assume that: • H1(k) =Math.abs(H(k)) % table.length • H2(k) = 1 + Math.abs(H(k)) % (table.length – x) where x is a small value; 1, 2, or 3. • Then: • H(k, i) = (H1(k) + i H2(k) ) % m

  19. Double Hashing Table 0 H(k5) H(k1) Universe of Keys - U H(k4) K1 Actual Keys – K H(k2) = H(k5) K4 K5 K2 K3 H(k3) m - 1

  20. External Chaining • In external chaining the hash table contains an array in which each component can hold more than one element of the hash table. • Essentially, a multiple dimension array or a linked list of elements can exist for each table slot. • The typical implementation is that each slot contains a linked list.

  21. External Chaining Table 0 H(k1) Universe of Keys - U H(k4) K1 Actual Keys – K H(k2) H(k5) K4 K5 K2 K3 H(k3) m - 1

  22. Load Factor • The load factor is a fraction that represents the number of elements stored in the table divided by the size of the table’s array. • a = the number of elements stored in the table the size of the table’s array

  23. Load Factor • If open addressing is used, then each table slot holds at most one element, therefore, the load factor can never be greater than 1. • If external chaining is used, then each table slot can hold many elements, therefore, the load factor may be greater than 1.

  24. Hashing Analysis • The worst case analysis for hashing is the case where every key is hashed into the same slot. • Q (n) – linear time. • The average time can be much faster.

  25. Average Search Analysis • Searching with Linear probing. • For a table that is not near full: • ½ ( 1 + 1 / (1 – a) ) • For a table that is full or near full: • Math.Sqrt( n ( p / 8) ) • Searching with double hashing. • (-ln (1 –a) ) / a where ‘l’ in ‘ln’ is ‘L’ • Searching with chained hashing. • 1 + (a / 2 ) • See Figure 11.6 in Main. Page 561

  26. Coding Example • Search Times program that demonstrates Linear, Binary, and Hashing. • The hashing uses the HashTable class.

  27. Hashing • Java provides the HashTable class, but it also provides two other classes. • The HashMap class implements a hash table using a map data structure. • The HashSet class implements a hash table using sets.

More Related