90 likes | 102 Views
CS 361 – Chapter 6. Desired operations in Dictionary ADT Hash tables Hash code computation Handling collisions Chaining Open addressing Linear probing Quadratic probing Double hashing. Dictionary ADT.
E N D
CS 361 – Chapter 6 • Desired operations in Dictionary ADT • Hash tables • Hash code computation • Handling collisions • Chaining • Open addressing • Linear probing • Quadratic probing • Double hashing
Dictionary ADT • Each item in some aggregation is assigned a key value. Look up the item by means of the key. • Sounds like an array, except the key value can be anything convenient for us, rather than restricting us to indices 0,1,2,… • Desired operations • search (key) • insert (item, key) • remove (key) • Finding and removing could fail if the key value is not found in the dictionary.
Implementation • Simple approach: ArrayList of (element, key) pairs. Called a “log file” d.s. • How would we implement the operations? • Inserting O(1) • Finding / removing O(n) • We would hope there’s a lot of inserting, to make this d.s. worthwhile! • More efficient approach: hash table • Array of “buckets” • Hash function to assign element to a bucket
Goal of hash table • A faster array • We hope that most of the time, insert / search / delete can be done in constant time • Two issues • Requires more space • In worst case, generally O(n) complexity. We don’t want this to happen often!
Implementation • Hash code: In a collection of objects, it’s desirable to assign each object a unique number. • Mathematically determined from its key. • There are good and bad ways to compute hash codes. We’d like these codes to be unique. • Compression: Since the hash code may be a big number, scale it down by performing a “mod” operation. • The result is the array index to insert / find / remove. • Collision: Sometimes a 3rd step is needed, in case 2 items map to the same bucket.
Hashing example • Many objects have composite values, as in a string, list or several attributes per object. • Give them numerical values (e.g. ASCII code) and combine (a0, a1, a2, … an–1) into a hash code. • We could add them all up: hashCode = 0 for i = 0 to n-1 hashCode += a[i] • When would this be a good / bad hash function?
Example 2 • To ensure more unique hash codes, we can use a polynomial approach. hash code = a0 c0 + a1 c1 + a2 c2 + … + an–1 c n–1 where c is some constant e.g. 7 • To avoid computing powers of c, we can rewrite the formula: a0 + c(a1 + c(a2 + c(a3 + … c(an – 1)))…) hashCode = a[n-1] for i = n-2 down to 0 hashCode = c * hashCode + a[i]
Collisions • Insert, but cell already taken! • Search, but a different element lives here! • Various ways to handle collision, such as: • Chaining: maintain a list at each bucket. HashSet does this. • Open addressing: look for another “open” cell. • Practice with Q 4-7 on page 215. • A hash table must be larger than # elements anticipated • We can set up a specific “load factor” of 0.75. If the ratio of elements to max size exceeds this factor, allocate a bigger hash table. • Design issues can be resolved with experimentation on your collection of data.
Techniques Key value k, hash value h(k): • Chaining • store a linked list at a[h(k)] • Linear probing • Try h(k)+1, h(k)+2, h(k)+3, h(k)+4, … • Quadratic probing • Try h(k)+1, h(k)+4, h(k)+9, h(k)+16, … • Double hashing • Also uses 2nd hash function h’(k) • Try h(k)+h’(k), h(k)+2h’(k), h(k)+3h’(k), …