160 likes | 253 Views
CS 261 – Data Structures. Hash Tables Part II: Using Buckets. Hash Tables, Review. Hash tables are similar to Vectors except… Elements can be indexed by values other than integers A single position may hold more than one element
E N D
CS 261 – Data Structures Hash Tables Part II: Using Buckets
Hash Tables, Review • Hash tables are similar to Vectors except… • Elements can be indexed by values other than integers • A single position may hold more than one element • Arbitrary values (hash keys) map to integers by means of a hash function • Computing a hash function is usually a two-step process: • Transform the value (or key) to an integer • Map that integer to a valid hash table index • Example: storing names • Compute an integer from a name • Map the integer to an index in a table (i.e., a vector, array, etc.)
Hash Tables Say we’re storing names: Angie Joe Abigail Linda Mark Max Robert John 0Angie, Robert Hash Function 1Linda 2Joe, Max, John 3 4Abigail, Mark
Hash Tables: Resolving Collisions There are two general approaches to resolving collisions: • Open address hashing: if a spot is full, probe for next empty spot • Chaining (or buckets): keep a Linked list at each table entry Today we will look at option 2
Resolving Collisions: Chaining / Buckets Maintain a Collection at each hash table entry: • Chaining/buckets:maintain a linked list (or other collection type data structure, such as an AVL tree) at each table entry 0 Robert Angie Linda 1 2 Max John Joe 3 4 Mark Abigail
Combining arrays and linked lists … struct hashTable { struct link ** table; // array initialized to null pointers int tablesize; int dataCount; // count number of elements … };
Hash table init Void hashTableInit (struct hashTable &ht, int size) { int i; ht->count = 0; ht->table = (struct link **) malloc(size * sizeof(struct link *)); assert(ht->table != 0); ht->tablesize = size; for (i = 0; i < size; i++) ht->table[i] = 0; /* null pointer */ }
Adding a value to a hash table public void add (struct hashTable * ht, EleType newValue) { // find correct bucket, add to list int indx = abs(hashfun(newValue)) % table.length; struct link * newLink = (struct link *) malloc(…) assert(newLink != 0); newLink->value = newValue; newLink->next = ht->table[indx]; ht->table[indx] = newLink; /* add to bucket */ ht->count++; // note: next step: reorganize if load factor > 3.0 } }
Contains test, remove • Contains: Find correct bucket, then see if the element is there • Remove: Slightly more tricky, because you only want to decrement the count if the element is actually in the list. • Alternatives: instead of keeping count in the hash table, can call count on each list. What are pro/con for this?
Remove - need to change previous • Since we have only single links, remove is tricky • Solutions: use double links (too much work), or use previous pointers • We have seen this before. Keep a pointer to the previous node, trail after current node Prev = 0; For (current = ht->table[indx]; current != 0; current = current->next) { if (EQ(current->value, testValue)) { … remove it } prev = current; }
Two cases, prev is null or not Prev = 0; For (current = ht->table[indx]; current != 0; current = current->next) { if (EQ(current->value, testValue)) { … remove it if (prev == 0) ht->table[indx] = current->next; else prev->next = current->next; free (current); ht->size--; return; } prev = current; }
Hash Table Size • Load factor: l = n / m • So, load factor represents the average number of elements at each table entry • Want the load factor to remain small • Can do same trick as open table hashing - if load factor becomes larger than some fixed limit (say, 3.0) then you double the table size # of elements Load factor Size of table
Hash Tables: Algorithmic Complexity • Assumptions: • Time to compute hash function is constant • Chaining uses a linked list • Worst case analysis All values hash to same position • Best case analysis Hash function uniformly distributes the values(all buckets have the same number of objects in them) • Find element operation: • Worst case for open addressing O( ) • Worst case for chaining O( ) • Best case for open addressing O( ) • Best case for chaining O( ) n n O(log n) if use AVL tree 1 1
Hash Tables: Average Case • Assuming that the hash function distributes elements uniformly (a BIG if) • Then the average case for all operations is O() • So you want to try and keep the load factor relatively small. • You can do this by resizing the table (doubling the size) if the load factor is larger than some fixed limit, say 10 • But that only improves things IF the hash function distributes values uniformly. • What happens if hash value is always zero?
So when should you use hash tables? • Your data values must be objects with good hash functions defined (string, Double) • Or you need to write your own definition of hashCode • Need to know that the values are uniformly distributed • If you can’t guarantee that, then a skip list or AVL tree is often faster
Your turn • Now do the worksheet to implement hash table with buckets • Run down linked list for contains test • Think about how to do remove • Keep track of number of elements • Resize table if load factor is bigger than 3.0 Questions??