1 / 16

CS 261 – Data Structures

CS 261 – Data Structures. Hash Tables Part II: Using Buckets. Hash Tables, Review. Hash tables are similar to Vectors except… Elements can be indexed by values other than integers A single position may hold more than one element

tirzah
Download Presentation

CS 261 – Data Structures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 261 – Data Structures Hash Tables Part II: Using Buckets

  2. Hash Tables, Review • Hash tables are similar to Vectors except… • Elements can be indexed by values other than integers • A single position may hold more than one element • Arbitrary values (hash keys) map to integers by means of a hash function • Computing a hash function is usually a two-step process: • Transform the value (or key) to an integer • Map that integer to a valid hash table index • Example: storing names • Compute an integer from a name • Map the integer to an index in a table (i.e., a vector, array, etc.)

  3. Hash Tables Say we’re storing names: Angie Joe Abigail Linda Mark Max Robert John 0Angie, Robert Hash Function 1Linda 2Joe, Max, John 3 4Abigail, Mark

  4. Hash Tables: Resolving Collisions There are two general approaches to resolving collisions: • Open address hashing: if a spot is full, probe for next empty spot • Chaining (or buckets): keep a Linked list at each table entry Today we will look at option 2

  5. Resolving Collisions: Chaining / Buckets Maintain a Collection at each hash table entry: • Chaining/buckets:maintain a linked list (or other collection type data structure, such as an AVL tree) at each table entry 0 Robert Angie Linda 1 2 Max John Joe 3 4 Mark Abigail

  6. Combining arrays and linked lists … struct hashTable { struct link ** table; // array initialized to null pointers int tablesize; int dataCount; // count number of elements … };

  7. Hash table init Void hashTableInit (struct hashTable &ht, int size) { int i; ht->count = 0; ht->table = (struct link **) malloc(size * sizeof(struct link *)); assert(ht->table != 0); ht->tablesize = size; for (i = 0; i < size; i++) ht->table[i] = 0; /* null pointer */ }

  8. Adding a value to a hash table public void add (struct hashTable * ht, EleType newValue) { // find correct bucket, add to list int indx = abs(hashfun(newValue)) % table.length; struct link * newLink = (struct link *) malloc(…) assert(newLink != 0); newLink->value = newValue; newLink->next = ht->table[indx]; ht->table[indx] = newLink; /* add to bucket */ ht->count++; // note: next step: reorganize if load factor > 3.0 } }

  9. Contains test, remove • Contains: Find correct bucket, then see if the element is there • Remove: Slightly more tricky, because you only want to decrement the count if the element is actually in the list. • Alternatives: instead of keeping count in the hash table, can call count on each list. What are pro/con for this?

  10. Remove - need to change previous • Since we have only single links, remove is tricky • Solutions: use double links (too much work), or use previous pointers • We have seen this before. Keep a pointer to the previous node, trail after current node Prev = 0; For (current = ht->table[indx]; current != 0; current = current->next) { if (EQ(current->value, testValue)) { … remove it } prev = current; }

  11. Two cases, prev is null or not Prev = 0; For (current = ht->table[indx]; current != 0; current = current->next) { if (EQ(current->value, testValue)) { … remove it if (prev == 0) ht->table[indx] = current->next; else prev->next = current->next; free (current); ht->size--; return; } prev = current; }

  12. Hash Table Size • Load factor: l = n / m • So, load factor represents the average number of elements at each table entry • Want the load factor to remain small • Can do same trick as open table hashing - if load factor becomes larger than some fixed limit (say, 3.0) then you double the table size # of elements Load factor Size of table

  13. Hash Tables: Algorithmic Complexity • Assumptions: • Time to compute hash function is constant • Chaining uses a linked list • Worst case analysis  All values hash to same position • Best case analysis  Hash function uniformly distributes the values(all buckets have the same number of objects in them) • Find element operation: • Worst case for open addressing  O( ) • Worst case for chaining  O( ) • Best case for open addressing  O( ) • Best case for chaining  O( ) n n  O(log n) if use AVL tree 1 1

  14. Hash Tables: Average Case • Assuming that the hash function distributes elements uniformly (a BIG if) • Then the average case for all operations is O() • So you want to try and keep the load factor relatively small. • You can do this by resizing the table (doubling the size) if the load factor is larger than some fixed limit, say 10 • But that only improves things IF the hash function distributes values uniformly. • What happens if hash value is always zero?

  15. So when should you use hash tables? • Your data values must be objects with good hash functions defined (string, Double) • Or you need to write your own definition of hashCode • Need to know that the values are uniformly distributed • If you can’t guarantee that, then a skip list or AVL tree is often faster

  16. Your turn • Now do the worksheet to implement hash table with buckets • Run down linked list for contains test • Think about how to do remove • Keep track of number of elements • Resize table if load factor is bigger than 3.0 Questions??

More Related