1 / 37

Hashtables

Hashtables. Hashtables. An Abstract data type that supports the following operations: Insert Find Remove Search trees can be used for the same operations but require an order relation to be defined an logarithmic time.

Download Presentation

Hashtables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hashtables

  2. Hashtables • An Abstract data type that supports the following operations: • Insert • Find • Remove • Search trees can be used for the same operations but require an order relation to be defined an logarithmic time. • Hashtables do not require an order relationship on the elements and all operations take O(1) time on average.

  3. Direct Access Tables • Assume that the keys are distinct numbers in the range U = {1,2,3….m}, use an array of size m and place the kth element in the kth index of the array. • O(1) time for all operations • Problem: wasteful for small sets and impractical if m is very large

  4. Hashtables • Main Idea: instead of using the keys themselves as index in the table, use a hash function for mapping keys to indices. • Note U is the set representing all possible keys, it is therefore usually much larger than m.

  5. Simple Uniform Hashing • We assume that we use a hash function that given an key, will hash the key into any slot with equal probability. • We will try to provide some reasonable hash functions later

  6. hash functions • The hash function is responsible to map keys into integers (slot numbers). A good hash function must have the following properties • 1. Easy to evaluate - computing h(x) in O(1) • 2. Uniform distribution over all the table slots • 3. Similar keys will be mapped to different slots

  7. hash functions • The first step is to represent the key as a natural integer number. • For example if S is a String then we can compute the interpret it as an integer value using the formula

  8. Collisions • Mapping keys to indices can cause collisions if to keys are mapped by the hash function to the same index • Solutions • Chaining • Open addressing

  9. Collision resolution - Chaining • All keys that have the same hash value are placed in a linked list • Insertion can be done at the beginning of the list in O(1) time • Searching is proportional to the length of the list

  10. Collision resolution by chaining • Let h be a hash table of 9 slots and h(k) = k mod 9, insert the elements : 6, 43, 23, 62, 1, 13, 34, 55, 25 h(6) = 6 mod 9 = 6 h(43) = 43 mod 9 = 7 h(23) = 23 mod 9 = 5 h(62) = 62 mod 9 = 8 h(1) = 1 mod 9 = 1 h(13) = 13 mod 9 = 4 h(34) = 34 mod 9 = 7 h(55) = 55 mod 9 = 1 h(25) = 25 mod 9 = 7

  11. Analysis • The load factor of a hashtable is defined by the number of elements stored in the table divided by the number of slots • An search will take under the assumption of uniform hashing

  12. Division method • An appropriate hash function for a hashtable that uses chaining is the division method. • Powers of 10 and 2 should be avoided • Good values are primes not close to powers of 2

  13. Open Addressing • Each element occupies a single slot in the hashtable. No chaining is done • To insert an element, we probe the table according to the hash function until an empty slot is found. • The hash function is now a function of both the key and the number of attempts in the insertion process

  14. Hash Insert • HashInsert (T,k) { int i; for (i = 0; i < m; i++) { j = h(k,i) if (T[j] == null) break; } if (i < m) T[j] = k else hashtable overflow }

  15. Hash Search • HashSearch (T,k) { int i; for (int i = 0; i < m; i++) { j = h(k,i) if (T[j] == null) return not found else if (T[j] ==k) return j } }

  16. Linear probing • Using linear probing the hash function uses an ordinary hash function h’, such as a function using the division method, and turns it into: • If a slot is occupied, we try the subsequent slot, etc., thus the initial slot determines the probing sequence for insertion and search.

  17. Linear Probing • Easy to implement but suffers from primary clustering. • The probability of probing into a slot following an occupied slot is greater than the probability of any other slot.

  18. Linear Probing • Given a hash function h’, the linear probing scheme is simply

  19. Exercise • You are given a hash table h with 11 slots. Demonstrate inserting the following elements using linear probing and a hash function h(k) = k mod m • 10,22,31,4,15,28,17,88,59

  20. Solution • h(10,0) = (10mod11 + 0) mod 11 = 10 • h(22,0) = (22mod11 + 0) mod 11 = 0 • h(31,0) = (31mod11 + 0) mod 11 = 9 • h(4,0) = (4mod11 + 0) mod 11 = 4 • h(15,0) = (15mod11 + 0) mod 11 = 4 • h(15,1) = (15mod11 + 0) mod 11 = 5 • h(28,0) = (28mod11 +1) mod 11 = 6 • h(17,0) = (17mod11 + 0) mod 11 = 6 • h(17,1) = (17mod11 + 1) mod 11 = 7 • h(88,0) = (88mod11 + 0) mod 11 = 10 • h(88,1) = (88mod11 +1) mod 11 = 1 • h(59,0) = (59mod11 + 0) mod 11 = 4 • h(59,1) = (59mod11 + 1) mod 11 = 5 • h(59,2) = (59mod11 + 2) mod 11 = 6 • h(59,3) = (59mod11 + 3) mod 11 = 7 • h(59,4) = (59mod11 + 4) mod 11 = 8

  21. Quadric Probing • Using quadratic probing the has function again uses an initial hash function h’, and is now • Choosing a subsequent slot once a slot is full depends on the probe number i. • Quadric probing involves a secondary form of clustering since only the initial probe determines the entire probing sequence,

  22. Quadric Probing • Given a hash function h’ quadric probing is done by:

  23. Example • You are given a hash table h with 11 slots. Demonstrate inserting the following elements using quadric probing and a hash function • 10,22,31,4,15,28,17,88,59

  24. h(10,0) = (10mod11 + 0) mod 11 = 10 • h(22,0) = (22mod11 + 0) mod 11 = 0 • h(31,0) = (31mod11 + 0) mod 11 = 9 • h(4,0) = (4mod11 + 0) mod 11 = 4 • h(15,0) = (15mod11 + 0) mod 11 = 4 • h(15,1) = (15mod11 + 1 + 3) mod 11 = 8 • h(28,0) = (28mod11 +1) mod 11 = 6 • h(17,0) = (17mod11 + 0) mod 11 = 6 • h(17,1) = (17mod11 + 1 + 3) mod 11 = 10 • h(17,2) = (17mod11 + 2 + 12) mod 11 = 9 • h(17,3) = (17mod11 + 3 + 27) mod 11 = 3 • h(88,0) = (88mod11 + 0) mod 11 = 0 • h(88,1) = (88mod11 + 1 + 3) mod 11 = 4 • h(88,2) = (88mod11 + 2 + 12) mod 11 = 3 • h(88,3) = (88mod11+ 3+ 27) mod 11 = 8 • h(88,4) = (88mod11+ 4+ 48) mod 11 = 8 • h(88,5) = (88mod11+ 5+ 75) mod 11 = 3 • h(88,6) = (88mod11+ 6+ 108) mod 11 = 4 • h(88,7) = (88mod11+ 7+ 147) mod 11 = 0 • h(88,8) = (88mod11+ 8+ 192) mod 11 = 2 • h(59,0) = (59mod11 + 0) mod 11 = 4 • h(59,1) = (59mod11 + 1 + 3) mod 11 = 8 • h(59,2) = (59mod11 + 1 + 12) mod 11 = 7

  25. Double Hashing • Given two hash functions • Problem should not have any common divisors.

  26. Double Hashing • Example 1: select m to be a power of 2, and design to produce odd numbers. • Example 2: select m to be prime, and m’ to be m-1.

  27. Analysis • In open addressing the load factor can not be more than 1. • Insertion and unsuccessful searching requires at most attempts • A successful search will take at most

  28. Analysis • When the table is 50% full, searching will require 1.387 probes on average • When the table is 90% full, searching will require 2.599 probes on average

  29. Problems with open addressing • If an element is deleted, we can not simply remove the element, since later search operations may fail. Rehashing will ruin the running time • Solution: Use a DELETED node.

  30. Rehashing • If we do not know the size of the elements in advance, we use a technique similar to the one used in vectors. Once the load factor reaches some predefined threshold, rehash the data into a larger hashtable.

  31. Example • Given a set S of unique integers and a number z, find such that x+y = z • An efficient worst case algorithm • An efficient average case algorithm

  32. An efficient worst case algorithm • 1. Sort all elements in S - . • 2. For every x in S we search for z-x (y) in S using binary search – Total of O(nlogn)

  33. An efficient average case algorithm • 1. We use a hash table where m is of order n for all we execute insert(x) • 2. For all we execute search(z-x) Total - average case Total - worst case

  34. Example • Given a set S of sortable items, we are asked if all items in S are unique. • 1. Sort the elements of S. • 2. Iterate on the elements of S searching for subsequent equal values. • Execution time

  35. Example • 1. Use a hash table were m is of order n. for all we execute insert(x). We modify the insert operation to signal if x already exists in the table. (every insert includes a search operation) • Execution time - average case

  36. Java hashcode • Each java object has a method public int hashcode, which is defined in class Object, and is supported for the purposes of hashtables and hashmaps. • The default implementation returns a unique number that is based on the memory location of the object. • If two objects are equal they must have the same hashcode

  37. Java hashcode • It is not required that distinct objects will have distinct hashcodes, but it will improve the performance of the hashtables. • Can the hashcode of an object change throughout it’s life cycle?

More Related