100 likes | 270 Views
Introduction to Hashing. Computer Science 4. Reference: Objective: Understand hashing and why it is useful. Methods of Searching. Linear (e.g. sorted/unsorted lists) Retrieval is O(N) But insertion is fast (O(N)) Binary search Retrieval is O(Log 2 N)
E N D
Introduction to Hashing Computer Science 4 Reference: Objective: Understand hashing and why it is useful.
Methods of Searching • Linear (e.g. sorted/unsorted lists) • Retrieval is O(N) • But insertion is fast (O(N)) • Binary search • Retrieval is O(Log2N) • But must keep array sorted so insertion is O(N) • Binary Trees • Insertion and retrieval both O(Log2N) average case • O(N) for the worst case, if tree doesn’t branch • Balanced Binary Trees • Insertion and retrieval both O(LogN)
Looking for a better way • What if there were a data structure where retrieval was O(1) almost all the time and where insertion is rarely worse than O(LogN) • Oh come on! There can’t be a data structure like that, can there?
Hashing • Uses a function to turn the key value into a number. • Uses this number as an index into a massive array. Store the item at that position in the array • E.g. • E.g. “Bogart, Humphrey” might hash to 24,501 • “Roberts, Julia” might hash to 88,860. • Humphrey Bogart’s record will be stored at position 24,501 • Julia Roberts’ record will be stored at position 88,860 • The function that turns keys into numbers is called a hash function. It returns hash values. • The array is called a hash table.
values [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] . . . Empty 4501 Empty 8903 8 10 7803 Empty . . . Empty 2298 3699 [ 97] [ 98] [ 99] Example of a Hash Table HandyParts company makes no more than 100 different parts. But the parts all have four digit numbers. This hash function can be used to store and retrieve parts in an array. Hash(key) = partNum % 100 Credit Hash Table examples to Sylvia Sorkin.
values [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] . . . Empty 4501 Empty 8903 8 10 7803 Empty . . . Empty 2298 3699 [ 97] [ 98] [ 99] Placing elements in the array Use the hash function Hash(key) = partNum % 100 to place the element with part number 5502 in the array.
Efficiency of Hashing • Unless two values hash to the same positon: • Insertion will be O(1) • Retrieval will be O(1) • Removal will be O(1) • But the “unless” in the first sentence is a BIG “unless”. • When two items hash to the same index, called a collision. • Hash table will be more efficient if we minimize collisions.
Resizing the hash table • Hash table efficiency depends on minimizing collisions • Want to make it larger when it begins to fill up • Half full is often used as a rule of thumb • Resizing a hash table is not as simple as resizing an array • Can’t just copy each item to the same index in a larger table. • Hash values typically depend on the size of the table, so the hash function would no longer be able to find those items. • Must rehash every item into the new table. • If this is done every time the table is half or more full, insertion efficiency becomes about O(LogN).
Hash Functions • Typically Hash functions will produce a large number • Turn it into an index using an integer remainder (% operator, aka the mod operator) • We mod the number with the size of the hash table. • Normally hash table sizes are chosen to be a large prime number. • Techniques for producing hash values • Addition: Add pieces of keys together • More advanced techniques involving exclusive OR
Summary • Hashing involves: • Using a hash function to convert a key value to a number • Using that number to index into a hash table • Storing the item at that index • Storage is O(1) and retrieval is O(1) as long as no two items hash to the same location.