CS 261 – Data Structures

CS 261 – Data Structures Hash Tables Part II: Using Buckets

Hash Tables, Review • Hash tables are similar to Vectors except… • Elements can be indexed by values other than integers • A single position may hold more than one element • Arbitrary values (hash keys) map to integers by means of a hash function • Computing a hash function is usually a two-step process: • Transform the value (or key) to an integer • Map that integer to a valid hash table index • Example: storing names • Compute an integer from a name • Map the integer to an index in a table (i.e., a vector, array, etc.)

Hash Tables Say we’re storing names: Angie Joe Abigail Linda Mark Max Robert John 0Angie, Robert Hash Function 1Linda 2Joe, Max, John 3 4Abigail, Mark

Hash Tables: Resolving Collisions There are two general approaches to resolving collisions: • Open address hashing: if a spot is full, probe for next empty spot • Chaining (or buckets): keep a Linked list at each table entry Today we will look at option 2

Resolving Collisions: Chaining / Buckets Maintain a Collection at each hash table entry: • Chaining/buckets:maintain a linked list (or other collection type data structure, such as an AVL tree) at each table entry 0 Robert Angie Linda 1 2 Max John Joe 3 4 Mark Abigail

Combining arrays and linked lists … struct hashTable { struct link ** table; // array initialized to null pointers int tablesize; int dataCount; // count number of elements … };

Hash table init Void hashTableInit (struct hashTable &ht, int size) { int i; ht->count = 0; ht->table = (struct link **) malloc(size * sizeof(struct link *)); assert(ht->table != 0); ht->tablesize = size; for (i = 0; i < size; i++) ht->table[i] = 0; /* null pointer */ }

Adding a value to a hash table public void add (struct hashTable * ht, EleType newValue) { // find correct bucket, add to list int indx = abs(hashfun(newValue)) % table.length; struct link * newLink = (struct link *) malloc(…) assert(newLink != 0); newLink->value = newValue; newLink->next = ht->table[indx]; ht->table[indx] = newLink; /* add to bucket */ ht->count++; // note: next step: reorganize if load factor > 3.0 } }

Contains test, remove • Contains: Find correct bucket, then see if the element is there • Remove: Slightly more tricky, because you only want to decrement the count if the element is actually in the list. • Alternatives: instead of keeping count in the hash table, can call count on each list. What are pro/con for this?

Remove - need to change previous • Since we have only single links, remove is tricky • Solutions: use double links (too much work), or use previous pointers • We have seen this before. Keep a pointer to the previous node, trail after current node Prev = 0; For (current = ht->table[indx]; current != 0; current = current->next) { if (EQ(current->value, testValue)) { … remove it } prev = current; }

Two cases, prev is null or not Prev = 0; For (current = ht->table[indx]; current != 0; current = current->next) { if (EQ(current->value, testValue)) { … remove it if (prev == 0) ht->table[indx] = current->next; else prev->next = current->next; free (current); ht->size--; return; } prev = current; }

Hash Table Size • Load factor: l = n / m • So, load factor represents the average number of elements at each table entry • Want the load factor to remain small • Can do same trick as open table hashing - if load factor becomes larger than some fixed limit (say, 3.0) then you double the table size # of elements Load factor Size of table

Hash Tables: Algorithmic Complexity • Assumptions: • Time to compute hash function is constant • Chaining uses a linked list • Worst case analysis  All values hash to same position • Best case analysis  Hash function uniformly distributes the values(all buckets have the same number of objects in them) • Find element operation: • Worst case for open addressing  O( ) • Worst case for chaining  O( ) • Best case for open addressing  O( ) • Best case for chaining  O( ) n n  O(log n) if use AVL tree 1 1

Hash Tables: Average Case • Assuming that the hash function distributes elements uniformly (a BIG if) • Then the average case for all operations is O() • So you want to try and keep the load factor relatively small. • You can do this by resizing the table (doubling the size) if the load factor is larger than some fixed limit, say 10 • But that only improves things IF the hash function distributes values uniformly. • What happens if hash value is always zero?

So when should you use hash tables? • Your data values must be objects with good hash functions defined (string, Double) • Or you need to write your own definition of hashCode • Need to know that the values are uniformly distributed • If you can’t guarantee that, then a skip list or AVL tree is often faster

Your turn • Now do the worksheet to implement hash table with buckets • Run down linked list for contains test • Think about how to do remove • Keep track of number of elements • Resize table if load factor is bigger than 3.0 Questions??

CS 261 – Data Structures

CS 261 – Data Structures

Presentation Transcript

ECE 242 Spring 2003 Data Structures in Java

Data Structures

caBIG Data Structures

Tutorial on Statistical N-Body Problems and Proximity Data Structures

Algorithms and Data Structures for Low-Dimensional Topology

Introduction to Algorithms and Data Structures

CSC 211 Data Structures Lecture 17

Abstract Data Types and Stacks

Kagan Structures

Data Structures Intro

CSE 326 Data Structures Part 6: Priority Queues, AKA Heaps

Linear Data Structures (Stack)

Intro to Computer Science I

241-423 Advanced Data Structures and Algorithms

CS 61b: Final Review

Advanced Data Structures NTUA 2007 R-trees and Grid File

241-423 Advanced Data Structures and Algorithms

Data Manipulation Using MySQL

COMPILER CONSTRUCTION

CS221: Algorithms and Data Structures Lecture #1 Complexity Theory and Asymptotic Analysis

CS 235102 Data Structures ( 資料結構 )

C++ Plus Data Structures