270 likes | 285 Views
Hash tables. But first. Any homework or exam questions?. Hash tables. It's possible to make a Dictionary implementation that can provide insert, remove, keyExists , and getValue in O(1) time What data structures do we know where we can always get elements in O(1) time?. Hash tables.
E N D
But first... • Any homework or exam questions?
Hash tables • It's possible to make a Dictionary implementation that can provide insert, remove, keyExists, and getValue in O(1) time • What data structures do we know where we can always get elements in O(1) time?
Hash tables • Arrays allow for O(1) access to item at a particular index • Arrays allow for O(1) storage of items at a particular index • We need a way to find out an index based on a data value • Let's look at a few simple examples:
Hash function • A hash function is a function that takes in a piece of data and returns an integer • A perfect hash function will give each item a different integer value • It is impossible to make a general perfect hash function • Let's just assume we have a perfect hash function named h that takes the item as an argument and returns the corresponding index
Dictionary functions • getValue • add • remove • keyExists private: V *values; bool *valueStored;
Hash functions • We will almost never have a perfect hash function • Instead, we will use a practical hash function where: • It is fast to compute • Each possible index is equally likely to be computed • We will later talk about how to handle two pieces of data end up with the same index • Let's talk about hash functions...
Hash functions • Suppose we have a hash table with a capacity that you get to choose • What methods could you use to compute indices in a reduced range from these numbers? • 279620 • 246300 • 323887 • 320379 • 967734 • 869647 • 264343 • 230845 • 627943 • 579202 • 776805 • 917097 • 422124 • 743121 • 629429 • 354149 • 682392 • 168516 • 933189 • 153793
Modular arithmetic • Hash functions are very often computed using modular arithmetic 1 + 5 6 + 5 9 + 4 11 + 8 5 + 8
Modular arithmetic • Hash functions are very often computed using modular arithmetic 1 + 5 6 + 5 9 + 4 11 + 8 5 + 8 1:00 + 5 hours 6:00 + 5 hours 9:00 + 4 hours 11:00 + 8 hours 5:00 + 8 hours
Modular arithmetic • Hash functions are very often computed using modular arithmetic 1 + 5 6 + 5 9 + 4 11 + 8 5 + 8 1:00 + 5 hours 6:00 + 5 hours 9:00 + 4 hours 11:00 + 8 hours 5:00 + 8 hours (1 + 5) % 12 (6 + 5) % 12 (9 + 4) % 12 (11 + 8) % 12 (5 + 8) % 12
Modulo hash function • Given a table with s elements, one possible hash function is: • h(i) = i % s • What could go wrong with this function?
Modulo hash function • Given a table with s elements, one possible hash function is: • h(i) = i % s • What could go wrong with this function? • We want the hash table to be prime or at least almost prime • Worst case, NOT an even number
Hashing strings • Oftentimes, hash tables are useful for looking up a value given a string • Strings are a series of chars (values from 0..255) • What could we do to generate a hash of a string (or any other series of numbers)? • "Test": (84, 101, 115, 116) • "Temp": (84, 101, 109, 112) • "What": (87, 104, 97, 116)
Hashing strings • Java approach: • currentHash = 0 • for each character c in the string: • currentHash = currentHash * 31 + c
Collisions • What can we do when we get two items that compute the same hash value? • We need a way to keep looking through the array to find a place to put our data • We can establish a probe sequence, or an order to scan through the array positions to find an open spot • What sort of sequences could we use?
Linear Probing • In linear probing, we will keep stepping through the array (wrapping around) until we find an open spot • If our hash function gives us a hash of h, try the following positions: • h • (h+1) % arraySize • (h+2) % arraySize • (h+3) % arraySize • etc. • What of our algorithms would we need to change? • How do we need to change them?
Linear probing: primary clustering • With linear probing, we can end up with stretches of our array that are completely full, while other sections can be nearly empty • This can cause us to fall to O(n) runtime • How could we adjust our probing to avoid this problem?
Quadratic probing • In quadratic probing, we will keep stepping through the array (wrapping around) until we find an open spot, increasing our step size the more we fail • If our hash function gives us a hash of h, try the following positions: • h • (h+1*1) % arraySize • (h+2*2) % arraySize • (h+3*3) % arraySize • etc.
Quadratic probing: secondary clustering • With quadratic probing, we can run into an issue where things that hash to the same index will lead to the same sequence of locations being tested • It seems that this is not an issue in practice
Alternatives to probing • If we don't want to perform probing, how else could we handle a situation where we have multiple items hash to the same value? • How can we store multiple items in the same place?
Separate chaining • Instead of probing, we can create a growable list for each index in the hash table • What could be an advantage of this? • What could be a downside?
Hash table size • How big should we make our array?
Hash table size • How big should we make our array? • How do we know that our hash table is full enough that we need to resize it?
Load factor • The load factor of a hash table is a measure of how full it is