1 / 25

Hashing

Hashing. Motivating Applications. Large collection of datasets Datasets are dynamic (insert, delete) Goal: efficient searching/insertion/deletion Hashing is ONLY applicable for exact-match searching. Direct Address Tables. If the keys domain is U  Create an array T of size U

zoltin
Download Presentation

Hashing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hashing

  2. Motivating Applications • Large collection of datasets • Datasets are dynamic (insert, delete) • Goal: efficient searching/insertion/deletion • Hashing is ONLY applicable for exact-match searching

  3. Direct Address Tables • If the keys domain is U  Create an array T of size U • For each key K  add the object to T[K] • Supports insertion/deletion/searching in O(1)

  4. Direct Address Tables Alg.: DIRECT-ADDRESS-SEARCH(T, k) return T[k] Alg.: DIRECT-ADDRESS-INSERT(T, x) T[key[x]] ← x Alg.: DIRECT-ADDRESS-DELETE(T, x) T[key[x]] ← NIL • Running time for these operations: O(1) Solution is to use hashing tables Drawbacks >> If U is large, e.g., the domain of integers, then T is large (sometimes infeasible) >> Limited to integer values and does not support duplication

  5. Direct Access Tables: Example Example 1: Example 2: U is the domain K is the actual number of keys

  6. Hashing • A data structure that maps values from a certain domain or range to another domain or range Hash function 3 15 Domain: String values 20 55 Domain: Integer values

  7. Hashing • A data structure that maps values from a certain domain or range to another domain or range Hash function Range 0 ….. 10000 Student IDs 950000 ….. 960000 Domain: numbers [0 … 10,000] Domain: numbers [950,000 … 960,000]

  8. Hash Tables • When K is much smaller than U, a hash tablerequires much less space than a direct-address table • Can reduce storage requirements to |K| • Can still get O(1) search time, but on the average case, not the worst case

  9. Hash Tables: Main Idea • Use a hash function h to compute the slot for each key k • Store the element in slot h(k) • Maintain a hash table of size m  T [0…m-1] • A hash function h transforms a key into an index in a hash table T[0…m-1]: h : U → {0, 1, . . . , m - 1} • We say that k hashes to slot h(k)

  10. Hash Tables: Main Idea Hash Table (of size m) 0 U (universe of keys) h(k1) h(k4) k1 K (actual keys) h(k2) = h(k5) k4 k2 k3 k5 h(k3) m - 1 >> m is much smaller that U (m <<U) >> m can be even smaller than |K|

  11. Example • Back to the example of 100 students, each with 9-digit SSN • All what we need is a hash table of size 100

  12. What About Collisions 0 U (universe of keys) h(k1) h(k4) k1 K (actual keys) h(k2) = h(k5) k4 k2 Collisions! k3 k5 h(k3) m - 1 • Collision means two or more keys will go to the same slot

  13. Handling Collisions • Many ways to handle it • Chaining • Open addressing • Linear probing • Quadratic probing • Double hashing

  14. Chaining: Main Idea • Put all elements that hash to the same slot into a linked list (Chain) • Slot j contains a pointer to the head of the list of all elements that hash to j

  15. Chaining - Discussion • Choosing the size of the hash table • Small enough not to waste space • Large enough such that lists remain short • Typically 10% -20% of the total number of elements • How should we keep the lists: ordered or not? • Usually each list is unsorted linked list

  16. Insertion in Hash Tables Alg.:CHAINED-HASH-INSERT(T, x) insert x at the head of list T[h(key[x])] • Worst-case running time is O(1) • May or may not allow duplication based on the application

  17. Deletion in Hash Tables Alg.:CHAINED-HASH-DELETE(T, x) delete x from the list T[h(key[x])] • Need to find the element to be deleted. • Worst-case running time: • Deletion depends on searching the corresponding list

  18. Searching in Hash Tables Alg.:CHAINED-HASH-SEARCH(T, k) search for an element with key k in list T[h(k)] • Running time is proportional to the length of the list of elements in slot h(k) What is the worst case and average case??

  19. T 0 chain m - 1 Analysis of Hashing with Chaining:Worst Case • All keys will go to only one chain • Chain size is O(n) • Searching is O(n) + time to apply h(k)

  20. T 0 chain chain chain chain m - 1 Analysis of Hashing with Chaining:Average Case • With good hash function and uniform distribution of keys • Any given element is equally likely to hash into any of the m slots • All chain will have similar sizes • Assume n (total # of keys), m is the hash table size • Average chain size  O (n/m) Average Search Time O(n/m): The common case

  21. Analysis of Hashing with Chaining:Average Case • If m (# of slots) is proportional to n (# of keys): • m = O(n) • n/m = O(1)  Searching takes constant time on average

  22. Hash Functions

  23. Hash Functions • A hash function transforms a key (k) into a table address (0…m-1) • What makes a good hash function? (1) Easy to compute (2) Approximates a random function: for every input, every output is equally likely (simple uniform hashing) (3) Reduces the number of collisions

  24. Hash Functions • Goal: Map a key k into one of the m slots in the hash table • Make table size (m) a prime number • Avoids even and power-of-2 numbers • Common function h(k) = F(k) mod m Some function or operation on K (usually generates an integer) The output of the “mod” is number [0…m-1]

  25. Examples of Hash Functions Collection of images F(k): Sum of the pixels colors h(k) = F(k) mod m Collection of strings F(k): Sum of the ascii values h(k) = F(k) mod m Collection of numbers F(k): just return k h(k) = F(k) mod m

More Related