1 / 33

Hashing

Hashing. Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891. Prerequisites. List ADT Linked List Table ADT Array Mathematics Modular Arithmetic Computer Organization ASCII Algorithm Order Analysis. Basic Data Types. Abstract Data Types (ADT). Stack<v>

osma
Download Presentation

Hashing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hashing Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891

  2. Prerequisites • List ADT • Linked List • Table ADT • Array • Mathematics • Modular Arithmetic • Computer Organization • ASCII • Algorithm • Order Analysis

  3. Basic Data Types

  4. Abstract Data Types (ADT) • Stack<v> • Can add and remove in LIFO order • Queue<v> • Can add and remove in FIFO order • Priority Queue<v> • Can add. Can remove in larger first order. v is comparable.

  5. Data Structure • An ADT, implemented by a Data Type • E.g. • ArrayList, using an array to implement a List ADT • ArrayHeap, using an array to implement a Heap (may in turn implements a PQ)

  6. Dictionary<k, v> ADT • Add(k, v) • Add a key-value pair • Remove(k) • Remove a key-value pair given the key • Search(k) : v • Search for the value given the key A Table ADT only differs in that key is an integer in range.

  7. Direct Addressing • Use the Table ADT • The key is the location • Efficient: O(1) for all operations • Infeasible: if the key can range from 1 to 20000000000, if the key is not numeric ...

  8. Time Complexity Note: For sorted array and BST, keys have to be ordered.

  9. Hash Function • Hash Function: hm(k) • Map all keys into an integer domain, e.g. 0 to m - 1 • E.g. CRC32 hashes strings into 32-bit integer (i.e. m = 232) • Alan: 1598313570 • Max: 3452409927 • Man: 943766770 • On: 2246271074 Note: We won’t use such a big m in our programs!

  10. Hash Table • Use a Table<int, v> ADT of size m • Use h(k) as the key • All operations can be done like using Table • Solved except • Collision: What to do if two different k have same h(k) • How to find a suitable hash function

  11. Hash Functions • If k is an integer, use h(k) = k mod m • More advanced: floor(m*frac(k*A)) for some 0 < A < 1 • If k is a string, convert it to an integer, e.g. • h(‘Alan’) = [ASC(‘A’)*2563+ ASC(‘l’)*2562+ ASC(‘a’)*256+ASC(‘n’)] mod m • If k is other data type, try to combine all features of the type

  12. Chaining(a.k.a. Open Hashing) • Use Table<int, List<v> > instead • When there are multiple k’s with same h(k), add it to the list (usually linked list) • When searching, remove it from the list • Order: O(length of all lists)

  13. Chaining Samples • h(‘Alan’) = h(‘Man’) = h(‘On’) = 0, h(‘Max’) = 5 • Operations: • Add <Alan, D> • Add <Max, Z> • Add <Man, X> • Add <On, Y> • Search for Max • Remove Man

  14. Chaining Samples • h(‘Alan’) = h(‘Man’) = h(‘On’) = 0, h(‘Max’) = 5 • Operations: • Add <Alan, D> • Add <Max, Z> • Add <Man, X> • Add <On, Y> • Search for Max • Remove Man

  15. Chaining Samples • h(‘Alan’) = h(‘Man’) = h(‘On’) = 0, h(‘Max’) = 5 • Operations: • Add <Alan, D> • Add <Max, Z> • Add <Man, X> • Add <On, Y> • Search for Max • Remove Man

  16. Chaining Samples • h(‘Alan’) = h(‘Man’) = h(‘On’) = 0, h(‘Max’) = 5 • Operations: • Add <Alan, D> • Add <Max, Z> • Add <Man, X> • Add <On, Y> • Search for Max • Remove Man

  17. Chaining Samples • h(‘Alan’) = h(‘Man’) = h(‘On’) = 0, h(‘Max’) = 5 • Operations: • Add <Alan, D> • Add <Max, Z> • Add <Man, X> • Add <On, Y> • Search for Max • Remove Man

  18. Chaining Samples • h(‘Alan’) = h(‘Man’) = h(‘On’) = 0, h(‘Max’) = 5 • Operations: • Add <Alan, D> • Add <Max, Z> • Add <Man, X> • Add <On, Y> • Search for Max • Remove Man

  19. Chaining Samples • h(‘Alan’) = h(‘Man’) = h(‘On’) = 0, h(‘Max’) = 5 • Operations: • Add <Alan, D> • Add <Max, Z> • Add <Man, X> • Add <On, Y> • Search for Max • Remove Man

  20. Chaining Samples • h(‘Alan’) = h(‘Man’) = h(‘On’) = 0, h(‘Max’) = 5 • Operations: • Add <Alan, D> • Add <Max, Z> • Add <Man, X> • Add <On, Y> • Search for Max • Remove Man

  21. Chaining (Optional) • Note that the Table can be Table<int, Container<v> > for any Container supporting Add, Remove and Search. • Why not consider other things, say another hash table? A BST?

  22. Open Addressing(a.k.a. Closed Hashing) • During collission, find another slot for the entry • E.g. if h(k) is not empty, try h(k)+1, h(k)+2, etc • Define the probe sequence <h(k, 0), h(k, 1), ..., h(k, m – 1)> be the sequence to slots to try (it should be a permutation of <0, 1, ..., m – 1> • Then both add and search will try the same sequence, so finally must find the pair <k, v> before an empty slot is reached • How about delete? Search and mark it empty? • Order: O(length of probe sequence)

  23. Open Addressing Samples Add Man Add Max

  24. Open Addressing Samples Add Man Search for Max

  25. Open Addressing Samples Delete Man Search for Max

  26. Collision Resolution • The method outlined above is called linear probing • In general, h(k, i) = h(k) + c i • Forms Primary Clustering • There is also quadratic probing • In general, h(k, i) = h(k) + c1 i2 + c2 i • Still forms Secondary Clustering

  27. Double Hashing (Optional) • h(k, i) = ( h(k) + i h’(k) ) mod m • Note: h’(k) cannot be 0 • Meaningful h’(k) should be in [1, m) • E.g. m – k mod (m – 1)

  28. How good is Hashing? • Nearly constant time if very short list or very low probing rate • So we need • A uniform hash function (your job) • A larger hash table (trade it off with memory limit)

  29. Size too small? (Optional) • Create a new hash table and re-hash all entries (not useful for OI use) • If use open addressing, need to re-hash to remove the deleted items anyway

  30. Extensible Hashing (Optional) • Use Table<int, Ptr> (Ptr is like the list in chaining) • The size m = 2k • Given any uniform hash function h(k), g(k) = last k bits of h(k) • Ptr points to an array of size r, each storing an entry • The problem: what to do when the array is full

  31. Extensible Hashing (Optional) h(‘Alan’) = 0, h(‘Man’) = 4, h(‘On’) = 12, h(‘Ben’) = 5, h(‘Max’)=5

  32. Extensible Hashing (Optional) Add Si where h(‘Si’) = 9, i.e. g(‘Si’) = 01

  33. Extensible Hashing (Optional) Add Unu where h(‘Unu’) = 4, i.e. g(‘Unu’) = 100 The first array will be split according to their h(k) Still need to chain?

More Related