1 / 27

Hash tables

Explore the concept of hash tables for efficient data handling in this educational guide. Learn about hash functions, modular arithmetic, collision resolution techniques like linear probing and quadratic probing, and alternatives to probing such as separate chaining.

kitts
Download Presentation

Hash tables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hash tables

  2. But first... • Any homework or exam questions?

  3. Hash tables • It's possible to make a Dictionary implementation that can provide insert, remove, keyExists, and getValue in O(1) time • What data structures do we know where we can always get elements in O(1) time?

  4. Hash tables • Arrays allow for O(1) access to item at a particular index • Arrays allow for O(1) storage of items at a particular index • We need a way to find out an index based on a data value • Let's look at a few simple examples:

  5. Hash function • A hash function is a function that takes in a piece of data and returns an integer • A perfect hash function will give each item a different integer value • It is impossible to make a general perfect hash function • Let's just assume we have a perfect hash function named h that takes the item as an argument and returns the corresponding index

  6. Dictionary functions • getValue • add • remove • keyExists private: V *values; bool *valueStored;

  7. Hash functions • We will almost never have a perfect hash function • Instead, we will use a practical hash function where: • It is fast to compute • Each possible index is equally likely to be computed • We will later talk about how to handle two pieces of data end up with the same index • Let's talk about hash functions...

  8. Hash functions • Suppose we have a hash table with a capacity that you get to choose • What methods could you use to compute indices in a reduced range from these numbers? • 279620 • 246300 • 323887 • 320379 • 967734 • 869647 • 264343 • 230845 • 627943 • 579202 • 776805 • 917097 • 422124 • 743121 • 629429 • 354149 • 682392 • 168516 • 933189 • 153793

  9. Modular arithmetic • Hash functions are very often computed using modular arithmetic 1 + 5 6 + 5 9 + 4 11 + 8 5 + 8

  10. Modular arithmetic • Hash functions are very often computed using modular arithmetic 1 + 5 6 + 5 9 + 4 11 + 8 5 + 8 1:00 + 5 hours 6:00 + 5 hours 9:00 + 4 hours 11:00 + 8 hours 5:00 + 8 hours

  11. Modular arithmetic • Hash functions are very often computed using modular arithmetic 1 + 5 6 + 5 9 + 4 11 + 8 5 + 8 1:00 + 5 hours 6:00 + 5 hours 9:00 + 4 hours 11:00 + 8 hours 5:00 + 8 hours (1 + 5) % 12 (6 + 5) % 12 (9 + 4) % 12 (11 + 8) % 12 (5 + 8) % 12

  12. Modulo hash function • Given a table with s elements, one possible hash function is: • h(i) = i % s • What could go wrong with this function?

  13. Modulo hash function • Given a table with s elements, one possible hash function is: • h(i) = i % s • What could go wrong with this function? • We want the hash table to be prime or at least almost prime • Worst case, NOT an even number

  14. Hashing strings • Oftentimes, hash tables are useful for looking up a value given a string • Strings are a series of chars (values from 0..255) • What could we do to generate a hash of a string (or any other series of numbers)? • "Test": (84, 101, 115, 116) • "Temp": (84, 101, 109, 112) • "What": (87, 104, 97, 116)

  15. Hashing strings • Java approach: • currentHash = 0 • for each character c in the string: • currentHash = currentHash * 31 + c

  16. Collisions • What can we do when we get two items that compute the same hash value? • We need a way to keep looking through the array to find a place to put our data • We can establish a probe sequence, or an order to scan through the array positions to find an open spot • What sort of sequences could we use?

  17. Linear Probing • In linear probing, we will keep stepping through the array (wrapping around) until we find an open spot • If our hash function gives us a hash of h, try the following positions: • h • (h+1) % arraySize • (h+2) % arraySize • (h+3) % arraySize • etc. • What of our algorithms would we need to change? • How do we need to change them?

  18. Linear probing: primary clustering • With linear probing, we can end up with stretches of our array that are completely full, while other sections can be nearly empty • This can cause us to fall to O(n) runtime • How could we adjust our probing to avoid this problem?

  19. Quadratic probing • In quadratic probing, we will keep stepping through the array (wrapping around) until we find an open spot, increasing our step size the more we fail • If our hash function gives us a hash of h, try the following positions: • h • (h+1*1) % arraySize • (h+2*2) % arraySize • (h+3*3) % arraySize • etc.

  20. Quadratic probing: secondary clustering • With quadratic probing, we can run into an issue where things that hash to the same index will lead to the same sequence of locations being tested • It seems that this is not an issue in practice

  21. Alternatives to probing • If we don't want to perform probing, how else could we handle a situation where we have multiple items hash to the same value? • How can we store multiple items in the same place?

  22. Separate chaining • Instead of probing, we can create a growable list for each index in the hash table • What could be an advantage of this? • What could be a downside?

  23. Hash table size • How big should we make our array?

  24. Hash table size • How big should we make our array? • How do we know that our hash table is full enough that we need to resize it?

  25. Load factor • The load factor of a hash table is a measure of how full it is

More Related