230 likes | 395 Views
Grokking Hash Tables. A hash table is…. Just a data structure that’s used to establish a mapping between arbitrary objects Not to be confused with Map : An interface that specifies methods that a program could call to get/set/query key/value mappings
E N D
A hash table is… • Just a data structure that’s used to establish a mapping between arbitrary objects • Not to be confused with Map: • An interface that specifies methods that a program could call to get/set/query key/value mappings • Basically, defines what mappings are • A hash table is one way (but not the only) to make a Map • And your MondoHashTable is just implementation of that hash table data struct that happens to match the Map interface
Example: /** * MondoHashTable is a hash table based * implementation of the Map interface */ import java.util.Map; public class MondoHashTable implements Map { … }
So what’s a mapping? • A mapping is just a pair relationship between two things • In math, we write a mapping m: xy to denote that m establishes relationships between things of type x and things of type y • Big restriction: every uniquex must map to a singley • Essentially: every left hand side has one and only one right hand side
Examples of mappings • 376.0827 (abs(sqrt(37))) • 79609 “Prof Lane’s office phone” • 123456789 studentRecord(“J. Student”) • -6.0827 36.999 • studentRecord(“J. Student”) gradeSet() • largeFunkyDataObject() otherObject() • “nigeria” 114 • “cs35” 12 • Left side is the key; right side is the value
Small integers are easy… double[] mapArray=new double[200]; mapArray[37]=Math.sqrt(37); 37 6.0827
What about non-integers? • Desire: have a table that gives quick lookup for arbitrary objects • Doesn’t require vast space • Answer: introduce an intermediate step • Hash function turns key into hash code • Use hash code to look up key/val pair in table • table[h(key)]=<key,value>
The big picture… MondoHashTable “nigeria” h(“nigeria”) get(“nigeria”) nigeria 114 1945462417
Some practical questions • 1.9 billion? Isn’t that a little much?
Some practical questions • 1.9 billion? Isn’t that a little much? • A: reduce mod table size • int h=hashFunction(Object a) % table.length;
Some practical questions • 1.9 billion? Isn’t that a little much? • A: reduce mod table size • int h=hashFunction(Object a) % table.length; • Where do the hash functions come from?
Some practical questions • 1.9 billion? Isn’t that a little much? • A: reduce mod table size • int h=hashFunction(Object a) % table.length; • Where do the hash functions come from? • A: java.lang.Object.hashCode()
Some practical questions • 1.9 billion? Isn’t that a little much? • A: reduce mod table size • int h=hashFunction(Object a) % table.length; • Where do the hash functions come from? • A: java.lang.Object.hashCode() • How big should the table be, initially?
Some practical questions • 1.9 billion? Isn’t that a little much? • A: reduce mod table size • int h=hashFunction(Object a) % table.length; • Where do the hash functions come from? • A: java.lang.Object.hashCode() • How big should the table be, initially? • Good choice: pick a prime # • Ask the user (arg to constructor)
Some practical questions • 1.9 billion? Isn’t that a little much? • A: reduce mod table size • int h=hashFunction(Object a) % table.length; • Where do the hash functions come from? • A: java.lang.Object.hashCode() • How big should the table be, initially? • Good choice: pick a prime # • Ask the user (arg to constructor) • What happens if the table gets too full?
Some practical questions • 1.9 billion? Isn’t that a little much? • A: reduce mod table size • int h=hashFunction(Object a) % table.length; • Where do the hash functions come from? • A: java.lang.Object.hashCode() • How big should the table be, initially? • Good choice: pick a prime # • Ask the user (arg to constructor) • What happens if the table gets too full? • A: resize it!
#1 killer question: Collisions • What happens if • (a.hashCode()%tSz)==(b.hashCode()%tSz) • Depends… • If a.equals(b), then these are the same key • If not… • This is a hash collision • Basically, you have two different keys pointing at the same location in the hash table • Have to resolve this somehow -- find unique storage for every key and don’t lose anything
Collision strategy 1: Chaining • Make each cell in the hash table a “bucket” containing multiple key/value pairs h(“nigeria”) nigeria 114 viagra 29 h(“viagra”)
Collision strategy 2: Open addressing • Each cell of the table actually holds a key/value pair • When you have a collision, rehash to find a new location for the new pair • Linear probing: try next cell in line • Quadratic probing: try cell h+1, then h+4, h+9, h+16, h+15 … • Double hashing: h(k)=(h1(k)+i*h2(k)) mod table.size() • Repeat probes until you find an empty spot
Map.keySet() • A common operation on hash tables (Maps): get all of the keys • You’ll probably use this in the “dump” functionality of SpamBGon • Map requires: • Set keySet(): Returns a set view of the keys contained in this map. • What’s a view?
Views of data • Different interface to the same underlying data • Doesn’t copy the data -- just provides new methods for accessing it • A “set view of the keys”, then, is an object that behaves like (i.e., implements) a set, but gives you access to all of the keys of the hash table.
The set view picture MondoHashTable Set size() contains() iterator() keySet()