1 / 19

Introduction to Hashing in Computer Science

Learn about the properties of a good hash function, handling collisions, open addressing, chaining, time complexity, load factor, Cuckoo Hashing, and Bloom Filters.

Download Presentation

Introduction to Hashing in Computer Science

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hashing II CS2110 Spring 2018

  2. Hash Functions 0 1 1 4 3 • Properties of a good hash: • fast • collision-resistant • evenly distributed • hard to invert • Requirements: • deterministic • return a number in [0..n]

  3. Hash Table add(“CA”) Hash hunction mod 6 CA 5 b 0 1 2 3 4 5 CA Two ways of handling collisions: Chaining 2. Open Addressing MA NY

  4. HashSet and HashMap Set<V>{ boolean add(V value); boolean contains(V value); boolean remove(V value); } Map<K,V>{ V put(K key, V value); V get(K key); V remove(K key); }

  5. Remove put('a') put('b') put('c') put('d') get('d') remove('c') get('d') put('e') Open Addressing Chaining e a c d b 0 0 1 1 2 2 3 3 a e c b d

  6. Time Complexity (no resizing)

  7. Load Factor Load factor

  8. Expected Chain Length • For each bucket, probability that a single object is hashed to that bucket is 1/length of array • There are n objects in the hash table • Expected length of chain is n/length of array = 0 1 2 3 4 5 a b c d e

  9. Expected Time Complexity (no resizing)

  10. Expected Number of Probes • We always have to probe H(v) • With probability , first location is full, have to probe again • With probability , second location is also full, have to probe yet again • … • Expected #probes = 0 1 2 3 4 5

  11. Expected Time Complexity (no resizing) Assuming constant load factor We need to dynamically resize!

  12. Amortized Analysis • In an amortized analysis, the time required to perform a sequence of operations is averaged over all the operations • Can be used to calculate average cost of operation vs.

  13. Amortized Analysis of put • Assume dynamic resizing with load factor : • Most put operations take (expected) time • If , put takes time • Total time to perform n put operations is • Average time to perform 1 put operation is

  14. Expected Time Complexity (with dynamic resizing)

  15. Cuckoo Hashing

  16. Cuckoo Hashing • Alternative solution to collisions • Assume you have two hash functions H1 and H2 0 1 2 3 4 5 d c e b d c a b What if there are loops?

  17. Complexity of Cuckoo Hashing • Worst Case: • Expected Case:

  18. Bloom Filters • Assume we only want to implement a set • What if you had stored the value at "all" hash locations (instead of one)? 0 1 2 3 4 5 🔵 🔵 🔵 🔵

  19. Features of Bloom Filters • Worst-case put, get, and remove • Works well with higher load factors • But: false positives

More Related