Hash tables

Hash tables

But first... • Any homework or exam questions?

Hash tables • It's possible to make a Dictionary implementation that can provide insert, remove, keyExists, and getValue in O(1) time • What data structures do we know where we can always get elements in O(1) time?

Hash tables • Arrays allow for O(1) access to item at a particular index • Arrays allow for O(1) storage of items at a particular index • We need a way to find out an index based on a data value • Let's look at a few simple examples:

Hash function • A hash function is a function that takes in a piece of data and returns an integer • A perfect hash function will give each item a different integer value • It is impossible to make a general perfect hash function • Let's just assume we have a perfect hash function named h that takes the item as an argument and returns the corresponding index

Dictionary functions • getValue • add • remove • keyExists private: V *values; bool *valueStored;

Hash functions • We will almost never have a perfect hash function • Instead, we will use a practical hash function where: • It is fast to compute • Each possible index is equally likely to be computed • We will later talk about how to handle two pieces of data end up with the same index • Let's talk about hash functions...

Hash functions • Suppose we have a hash table with a capacity that you get to choose • What methods could you use to compute indices in a reduced range from these numbers? • 279620 • 246300 • 323887 • 320379 • 967734 • 869647 • 264343 • 230845 • 627943 • 579202 • 776805 • 917097 • 422124 • 743121 • 629429 • 354149 • 682392 • 168516 • 933189 • 153793

Modular arithmetic • Hash functions are very often computed using modular arithmetic 1 + 5 6 + 5 9 + 4 11 + 8 5 + 8

Modular arithmetic • Hash functions are very often computed using modular arithmetic 1 + 5 6 + 5 9 + 4 11 + 8 5 + 8 1:00 + 5 hours 6:00 + 5 hours 9:00 + 4 hours 11:00 + 8 hours 5:00 + 8 hours

Modular arithmetic • Hash functions are very often computed using modular arithmetic 1 + 5 6 + 5 9 + 4 11 + 8 5 + 8 1:00 + 5 hours 6:00 + 5 hours 9:00 + 4 hours 11:00 + 8 hours 5:00 + 8 hours (1 + 5) % 12 (6 + 5) % 12 (9 + 4) % 12 (11 + 8) % 12 (5 + 8) % 12

Modulo hash function • Given a table with s elements, one possible hash function is: • h(i) = i % s • What could go wrong with this function?

Modulo hash function • Given a table with s elements, one possible hash function is: • h(i) = i % s • What could go wrong with this function? • We want the hash table to be prime or at least almost prime • Worst case, NOT an even number

Hashing strings • Oftentimes, hash tables are useful for looking up a value given a string • Strings are a series of chars (values from 0..255) • What could we do to generate a hash of a string (or any other series of numbers)? • "Test": (84, 101, 115, 116) • "Temp": (84, 101, 109, 112) • "What": (87, 104, 97, 116)

Hashing strings • Java approach: • currentHash = 0 • for each character c in the string: • currentHash = currentHash * 31 + c

Collisions • What can we do when we get two items that compute the same hash value? • We need a way to keep looking through the array to find a place to put our data • We can establish a probe sequence, or an order to scan through the array positions to find an open spot • What sort of sequences could we use?

Linear Probing • In linear probing, we will keep stepping through the array (wrapping around) until we find an open spot • If our hash function gives us a hash of h, try the following positions: • h • (h+1) % arraySize • (h+2) % arraySize • (h+3) % arraySize • etc. • What of our algorithms would we need to change? • How do we need to change them?

Linear probing: primary clustering • With linear probing, we can end up with stretches of our array that are completely full, while other sections can be nearly empty • This can cause us to fall to O(n) runtime • How could we adjust our probing to avoid this problem?

Quadratic probing • In quadratic probing, we will keep stepping through the array (wrapping around) until we find an open spot, increasing our step size the more we fail • If our hash function gives us a hash of h, try the following positions: • h • (h+1*1) % arraySize • (h+2*2) % arraySize • (h+3*3) % arraySize • etc.

Quadratic probing: secondary clustering • With quadratic probing, we can run into an issue where things that hash to the same index will lead to the same sequence of locations being tested • It seems that this is not an issue in practice

Alternatives to probing • If we don't want to perform probing, how else could we handle a situation where we have multiple items hash to the same value? • How can we store multiple items in the same place?

Separate chaining • Instead of probing, we can create a growable list for each index in the hash table • What could be an advantage of this? • What could be a downside?

Hash table size • How big should we make our array?

Hash table size • How big should we make our array? • How do we know that our hash table is full enough that we need to resize it?

Load factor • The load factor of a hash table is a measure of how full it is

Hash tables

Hash tables

Presentation Transcript

Hash Tables

Hash Tables

Hash Tables

Hash Tables

Hash Tables

Hash Tables

Hash Tables

Hash Tables

Hash Tables

Hash Tables

HASH TABLES

Hash Tables

Hash Tables

Hash Tables

Hash Tables

Hash Tables

Hash Tables

Hash Tables

Hash Tables

Hash Tables

Hash Tables