Hashing

Hashing Alan, Tam Siu Lung 96397999 Tam@SiuLung.com 99967891

Prerequisites • List ADT • Linked List • Table ADT • Array • Mathematics • Modular Arithmetic • Computer Organization • ASCII • Algorithm • Order Analysis

Basic Data Types

Abstract Data Types (ADT) • Stack<v> • Can add and remove in LIFO order • Queue<v> • Can add and remove in FIFO order • Priority Queue<v> • Can add. Can remove in larger first order. v is comparable.

Data Structure • An ADT, implemented by a Data Type • E.g. • ArrayList, using an array to implement a List ADT • ArrayHeap, using an array to implement a Heap (may in turn implements a PQ)

Dictionary<k, v> ADT • Add(k, v) • Add a key-value pair • Remove(k) • Remove a key-value pair given the key • Search(k) : v • Search for the value given the key A Table ADT only differs in that key is an integer in range.

Direct Addressing • Use the Table ADT • The key is the location • Efficient: O(1) for all operations • Infeasible: if the key can range from 1 to 20000000000, if the key is not numeric ...

Time Complexity Note: For sorted array and BST, keys have to be ordered.

Hash Function • Hash Function: hm(k) • Map all keys into an integer domain, e.g. 0 to m - 1 • E.g. CRC32 hashes strings into 32-bit integer (i.e. m = 232) • Alan: 1598313570 • Max: 3452409927 • Man: 943766770 • On: 2246271074 Note: We won’t use such a big m in our programs!

Hash Table • Use a Table<int, v> ADT of size m • Use h(k) as the key • All operations can be done like using Table • Solved except • Collision: What to do if two different k have same h(k) • How to find a suitable hash function

Hash Functions • If k is an integer, use h(k) = k mod m • More advanced: floor(m*frac(k*A)) for some 0 < A < 1 • If k is a string, convert it to an integer, e.g. • h(‘Alan’) = [ASC(‘A’)*2563+ ASC(‘l’)*2562+ ASC(‘a’)*256+ASC(‘n’)] mod m • If k is other data type, try to combine all features of the type

Chaining(a.k.a. Open Hashing) • Use Table<int, List<v> > instead • When there are multiple k’s with same h(k), add it to the list (usually linked list) • When searching, remove it from the list • Order: O(length of all lists)

Chaining Samples • h(‘Alan’) = h(‘Man’) = h(‘On’) = 0, h(‘Max’) = 5 • Operations: • Add <Alan, D> • Add <Max, Z> • Add <Man, X> • Add <On, Y> • Search for Max • Remove Man

Chaining (Optional) • Note that the Table can be Table<int, Container<v> > for any Container supporting Add, Remove and Search. • Why not consider other things, say another hash table? A BST?

Open Addressing(a.k.a. Closed Hashing) • During collission, find another slot for the entry • E.g. if h(k) is not empty, try h(k)+1, h(k)+2, etc • Define the probe sequence <h(k, 0), h(k, 1), ..., h(k, m – 1)> be the sequence to slots to try (it should be a permutation of <0, 1, ..., m – 1> • Then both add and search will try the same sequence, so finally must find the pair <k, v> before an empty slot is reached • How about delete? Search and mark it empty? • Order: O(length of probe sequence)

Open Addressing Samples Add Man Add Max

Open Addressing Samples Add Man Search for Max

Open Addressing Samples Delete Man Search for Max

Collision Resolution • The method outlined above is called linear probing • In general, h(k, i) = h(k) + c i • Forms Primary Clustering • There is also quadratic probing • In general, h(k, i) = h(k) + c1 i2 + c2 i • Still forms Secondary Clustering

Double Hashing (Optional) • h(k, i) = ( h(k) + i h’(k) ) mod m • Note: h’(k) cannot be 0 • Meaningful h’(k) should be in [1, m) • E.g. m – k mod (m – 1)

How good is Hashing? • Nearly constant time if very short list or very low probing rate • So we need • A uniform hash function (your job) • A larger hash table (trade it off with memory limit)

Size too small? (Optional) • Create a new hash table and re-hash all entries (not useful for OI use) • If use open addressing, need to re-hash to remove the deleted items anyway

Extensible Hashing (Optional) • Use Table<int, Ptr> (Ptr is like the list in chaining) • The size m = 2k • Given any uniform hash function h(k), g(k) = last k bits of h(k) • Ptr points to an array of size r, each storing an entry • The problem: what to do when the array is full

Extensible Hashing (Optional) h(‘Alan’) = 0, h(‘Man’) = 4, h(‘On’) = 12, h(‘Ben’) = 5, h(‘Max’)=5

Extensible Hashing (Optional) Add Si where h(‘Si’) = 9, i.e. g(‘Si’) = 01

Extensible Hashing (Optional) Add Unu where h(‘Unu’) = 4, i.e. g(‘Unu’) = 100 The first array will be split according to their h(k) Still need to chain?

Hashing

Hashing

Presentation Transcript

Hashing

Hashing

Hashing

Hashing

Hashing

Hashing

Hashing

HASHING

Hashing

Hashing

Hashing

Hashing

Hashing

HASHING

Hashing

Hashing

Hashing, Hashing Tables

Hashing

Hashing

Hashing