HashTable

HashTable The abstract dictionary

Dictionary/Map • A COLLECTION of data that is accessed by “KEY” values • The keys may be ordered or unordered • Multiple key values may/may-not be allowed • Supports the following fundamental methods • V put(K key, V value) • Inserts data into the dictionary using the specified key • V get(Object key) • Returns the data associated with the specified key • An error occurs if the specified key is not in the dictionary • Object remove(Object key) • Removes the data associated with the specified key and returns the data. • An error occurs if the specified key is not in the dictionary

Abstract Dictionary Example Operation Output Dictionary put(5, A) null ((5,A)) put(7, B) null ((5,A), (7,B)) put(2,C) null ((5,A), (7,B), (2,C)) get(A) null ((5,A), (7,B), (2,C)) get(7) B ((5,A), (7,B), (2,C)) put(2, Q) C ((5,A), (7,B), (2, Q)) get(2) Q ((5,A), (7,B), (2, Q)) remove(Q) null ((5,A), (7,B), (2, Q)) remove(2) Q ((5,A), (7,B))

The Java Dictionary Map Class (unordered dictionary) boolean containsKey(Object key) boolean containsValue(Object value) Object get(Object key) Object put(Object key, Object value) Object remove(Object key) Collection values() • SortedMap Class (ordered dictionary) • Comparator comparator() • Object firstKey() • SortedMap headMap(Object toKey) • Object lastKey() • SortedMap subMap(Object key1, Object key2) • SortedMap tailMap(Object fromKey) Two interfaces. SortedMap extends Map.

Dictionary/Map Implementation • How to implement the Map class? • List-based • Array-based list • Linked list • How to implement the SortedMap class? • List-based • Array-based list • Linked list • Tree-based

Array Based Map public class ArrayMap implements Map { private Item[] items; private int size; private static final int DEFAULT_CAPACITY = 200; private static class Item { private Object key; private Object data; public Item(Object k, Object d) { key = k; data = d; } } ArrayMap() { items = new Item[DEFAULT_CAPACITY]; size = 0; } public Object put(Object k, object v) { if(!containsKey(k)) { if(size == items.length) resize(); items[size++] = new Item(k, v); return null} } else { Object old = get(k); items[size++] = new Item(k, v); return old; } } public Object get(Object key) { for(int i=0; i<size; i++) { if(items[i].key.equals(key)) return items[i].data; } return null; } } How to change to implement a SortedMap?

That’s Amazing Runtime Comparisons Array Linked AVL Map Sorted Map Sorted Sorted put O(1) O(N) O(1) O(N) O(Log N) get O(N) O(Log N) O(N) O(N) O(Log N) remove O(N) O(N) O(N) O(N) O(Log N)

What is a Hashtable? • A hashtable is an unordereddictionary that uses an array to store data • Each data element is associated with a key • Each key is mapped into an array index using a hash function • The key AND the data are then stored in the array • Hashtables are commonly used in the construction of compiler symbol tables.

DictionariesAVL Trees vs. Hashtables Method AVL Hashtable Worst Average Not Bad Worst Average Astounding! put O(Log N) O(Log N) O(N) O(1) get O(Log N) O(Log N) O(N) O(1) remove O(Log N) O(Log N) O(N) O(1)

0 1 2 3 4 5 6 Simple Example Insert data into the hashtable using characters as keys The hashtable is an array of “items” The hashtables’ capacity is 7 The hash function must take a character as input and convert it into a number between 0 and 6. Use the following hash function: Let P be the position of the character in the English alphabet (starting with 1). The hash function h(K) = P The function must be normalized in order to map into the appropriate range (0-6). The normalized hash function is h(K) % 7.

0 1 2 3 4 5 6 Example put(B2, Data1) put(S19, Data2) put(J10, Data3) put(N14, Data4) put(X24, Data5) put(W23, Data6) put(B2, Data7) get(X24) get(W23) This is called a collision Collisions are handled via a “collision resolution policy” (N14, Data4) (B2, Data1) (J10, Data3) (X24, Data5) ??? (S19, Data2)

Details and Definitions • Various means of “collision resolution” can be used. The collision resolution policy determines what is done when two keys map to the same array index. • Open Addressing: look for an open slot • Separate Chaining: keep a list of key/value pairs in a slot • Load factor  is the size of the table divided by the capacity of the table

put(B2, Data1) put(S19, Data2) put(J10, Data3) put(N14, Data4) put(X24, Data5) put(W23, Data6) get(X24) get(W23) 0 1 2 3 4 5 6 (N14, Data4) (X24, Data5) (B2, Data1) (J10, Data3) (S19, Data2) (W23, Data7) Example Open Addressing: When a collision occurs, probe for an empty slot. In this case, use linear probing (looking “down”) until an empty slot is found. (X24, Data5) ???

Open Addressing • Uses a “probe sequence” to look for an empty slot to use • The first location examined is the “hash” address • The sequence of locations examined when locating data is called the “probe sequence” • The probe sequence {s(0), s(1), s(2), … } can be described as follows: s(i) = norm(h(K) + p(i)) • where h(K) is the “hash function” mapping K to an integer • p(i) is a “probing function” returning an offset for the ith probe • norm is the “normalizing function” (usually division modulo capacity)

Open Addressing • Linear probing • use p(i) = i • The probe sequence becomes {norm(h(k)), norm(h(k)+1), norm(h(k)+2), …} • Quadratic probing • use p(i) = i2 • The probe sequence becomes {norm(h(k)), norm(h(k)+1), norm(h(k)+4),…} • Must be careful to allow full coverage of “empty” array slots • A theorem states that this method will find an empty slot if the table is not more that ½ full.

Example • Given a hash table with M = 13, and hash function h(k) = k mod 13, give the first four positions in the probing sequence for K=21 under the following collision resolution policies. 1) Linear Probing 2) Quadratic probing

Collisions • Given N people in a room, what are the odds that at least two of them will have the same birthday? • Table capacity of 365 • After N insertions what are the odds of at least one collision? Who wants to be a Millionaire? Assume N = 23 (load factor is therefore 23/365 = 6.3%). What are the approximate odds that two of these people have the same birthday? 10% 75% 25% 90% 50% 99%

Collisions Let Q(n) be the probability that when n people are in a room, nobody has the same birthday. Let P(n) be the probability that when n people are in a room, at least two of them have the same birthday. P(n) = 1 – Q(n) Consider that: Q(1) = 1 Q(2) = Odds that Q(1) don’t collide times the odds of one more person not “colliding” Q(2) = Q(1) * 364/365 Q(3) = Q(2) * 363/365 Q(4) = Q(3) * 362/365 … Q(n) = (365/365) * (364/365) * (363/365) * … * ((365-n+1)/365) Q(n) = 365! / (365n * (365-n)!)

Collisions Number of people N Odds of Collision 5 2.7% 10 11.7% 15 25.3% Odds of a collision 23 50.7% 30 70.1% 40 89.1% 45 94.1% 100 99.9999% Collisions are more frequent than you might expect, even for low load factors!

Object Dictionary Map Hashtable Java Hashtable Class Hierarchy Note: the Hashtable class has been deprecated. You should use the Map class instead.

Java Dictionary Class • Enumeration elements() • Returns an enumeration of the values in the dictionary • Object get(Object key) • Returns the value to which the key is mapped • Enumeration keys() • Returns an enumeration of the keys in the dictionary • Object remove(Object key) • Removes the key (and its corresponding value) from the dictionary

What about Keys? • Can any object be used as a key??? class Person { private String name; private String phone; Person(String n, String p) { name = n; phone = p; } public boolean equals(Object other) { // stuff here } public int hashCode() { // stuff here } } How to convert a Person object to an integer value?

What about Keys? • What if the key changes (mutates)? Map database = new Hashtable(); for(int i=0; i<numberOfRecords; i++) { // read data from file // create a new Person object database.put(newPersonKey, newPersonData); } /// maintain the database /// See if a person with the number exists? Person p1 = new Person(searchForName, searchForPhone); Person p2 = (Person)database.get(p1); /// maybe need to change the persons phone number p2.setPhone(newPhone); /// Lets see if the object is still in the dictionary? p2 = (Person)database.get(p1); class Person { private String name; private String phone; Person(String n, String p) { name = n; phone = p; } public boolean equals(Object other) { // stuff here } public int hashCode() { // stuff here } }

Java Hashcode Method • The Object class defines a public int hashCode() method. Returns a hash code value for the object. This method is supported for the benefit of hashtables such as those provided by java.util.Hashtable. The general contract of hashCode is: Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application. If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result. It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables. As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)

Java Hashcode Method • The String class overrides the hashCode method. public int hashCode() Returns a hash code for this string. The hash code for a String object is computed as s[0]*31(n-1) + s[1]*31(n-2) + ... + s[n-2] *31(1) +s[n-1] *31(0) using int arithmetic, where s[i] is the ith character of the string and n is the length of the string. The hash value of the empty string is zero.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Hashcodes and table size • Hashcodes should be fast/easy to compute • Keys should evenly distribute across the table • Hashtable capacities are usually kept at prime-values to avoid problems with probe sequences • Consider inserting into the table below using quadratic probing and a key object that hashes to index 2

We need to have a little talk • How to remove an item from a hashtable that uses open addressing? • Consider a table of size 11 with the following sequence of operations using h(k) = K%11 and p(i) = i (linear probe). • put(36, D1) • put(23, D2) • put(4, D3) • put(46, D4) • put(1, D5) • get(1) • remove(23) • remove(36) • get(1)

Removal • If an item is removed from the table, it could mess up gets on other items in the table. • Fix the problem by using a “tombstone” marker to indicate that while the item has been removed. OCCUPIED if encountered during a “get” or “remove” A tombstone means EMPTY if encountered during a “put”

Double Hashing • Another probing strategy is to use “double hashing” • The probe sequence becomes s(k,i) = norm(h(k) + i*h2(k)) • The hash value is determined by “two” hash functions and is typically better than linear or quadratic probing.

Example-2 • Given a hash table with M = 13, and hash function h(k) = k mod 13, give the first four positions in the probing sequence for K=21 under the following collision resolution policies. Double hashing with h2 (k) = (k mod 5) + 1

Separate Chaining • A way to “avoid” collisions • Each array slot contains a list of data elements • The fundamental methods then become: • PUT: hash into array and add to list • GET: hash into array and search the list • REMOVE: hash into array and remove from list • The built-in HashMap and Hashtable classes use separate chaining

Chaining Example put(B2, Data1) put(S19, Data2) put(J10, Data3) put(N14, Data4) put(X24, Data5) put(W23, Data6) put(B2, Data7) get(X24) get(W23) 0 1 2 3 4 5 6 (N14, Data4) (B2, Data1) (J10, Data3) (X24, Data5) ??? (S19, Data2)

I’m so relieved! Chaining Example put(B2, Data1) put(S19, Data2) put(J10, Data3) put(N14, Data4) put(X24, Data5) put(W23, Data6) put(B2, Data7) get(X24) get(W23) 0 1 2 3 4 5 6 (N14, Data4) (B2, Data1) (X24, Data5) (J10, Data3) (S19, Data2)

Analysis • Given a load factor  the expected number of collisions in a search are

Separate Chaining Quadratic Probing Linear Probing Analysis

SortedMap Dictionary • Use a Tree (binary search, splay, AVL) • Using AVL means O(Log N) for all methods! class TreeMap implements SortedMap { private AVLTree tree = new AVLTree(); private static class Item implements Comparable { private Comparable key; private Object data; public Item(Object k, Object d) { key = k; data = d; } public int compareTo(Object other) { // code??? } } public void put(Object k, object v) { // code??? } public Object get(Object k) { // code??? } public Object remove(Object k) { // code??? } }

Perl/Tcl • Some languages support hashtables as a native data type. Perl and Tcl both have built-in “associative arrays”. // Perl example %occupations = (); $occupations{hunt} = ‘UWL Professor’; $occupations{cremer} = ‘Painter’; $occupations{olson} = ‘Doctor’;

Summary • Dictionaries may be ordered or unordered • Unordered can be implemented with • lists (array-based or linked) • hashtables (best solution) • Ordered can be implemented with • lists (array-based or linked) • trees (avl (best solution), splay, bst)

Java.util.HashSet<E> • HashSet() • int size() • void add(E e) • boolan contains(E o) • Iterator<E> iterator()

Using a HashSet String s = "If I know the answer I would tell you that I knew”; String[] words = s.split(" "); HashSet<String> hs = new HashSet<String>(); for(int i = 0; i < words.length; i++) { hs.add(words[i]); } System.out.println("Total words: " + words.length); System.out.println("Unique words " + hs.size()); System.out.println(hs);

HashMap HashMap<String, Integer> hash = new HashMap<String, Integer>(); String k1 = "key1"; String k2 = "key2”; Integer i1 = new Integer(1); Integer i2 = new Integer(2); hash.put(k1,i1); hash.put(k2,i2); System.out.println(hash.get(k1)); System.out.println(hash.get(k2));

Suppose an array of ints called X has been declared and assigned values in all its slots (i.e X[0] through X[X.length-1] have been assigned values). Write a program that will print to standard output the frequency with which each value that appears in X is found. For example, if before the code segment executes X contains the values 9, 10, 1, 7, -1, 2, 10, 1, 7, -1, 1, 1, 7, 10, then the program should print the following to standard out. Value Frequency 1 4 2 1 7 3 9 1 10 3 -1 2

HashTable

HashTable

Presentation Transcript

HashTable CISC4080, Computer Algorithms CIS, Fordham Univ.

HashTable

HashTable