340 likes | 348 Views
"Learn how hashing simplifies data storage and retrieval, explore hash tables for efficient indexing, and grasp collision resolution techniques. Discover the power of hash functions and open-address hashing. Enhance your understanding of dictionary ADTs and operations on hash tables."
E N D
Hashing It’s not just for breakfast anymore! hashing
Hashing: the facts • Approach that involves both storing and searching for values • Behavior is linear in the worst case, but strong competitor with binary searching in the average case • Hashing makes it easy to add and delete elements, an advantage over binary search (since the latter requires sorted array) hashing
Dictionary ADT • Previously, we have seen a dictionary ADT implemented as a binary search tree • A hash table can be used to provide an array-based dictionary implementation • Abstract properties of dictionary: • every item has a key • to retrieve an item, specify key and retrieval process fetches associated data hashing
Possible structure for single dictionary item template <class item> struct RecordType { size_t key; item datarecord; } hashing
Setting up the array • One approach to an array-based dictionary would be to create consecutive keys, storing the records so that each key corresponds to its index -- this is the method used in MS Access, for example • An alternative would be to use an existing attribute of the data to be stored as the key value; this approach is more typical of hashing hashing
Setting up the array • Use of existing key field presents challenges: • Value may be too large for indexing: e.g. social security number • No guarantee that individual values will be close enough together for effective indexing: e.g. last 4 digits of social security numbers of students in a class hashing
Solution: hashing • Instead of direct use of data field, a function is applied to the original value to produce a valid index: this is called the hash function • The hash function maps the key to an index that can be used to insert data into the array or to retrieve data based on a given key • An array that uses hashing for indexing is called a hash table hashing
Operations on a hash table • Inserting an item • calculate hash value (index) from item key • check index to determine if space is open • if open, insert item • if not open, collision occurs; search through array for next open slot • requires some mechanism for recognizing an empty space; can’t just start with uninitialized array hashing
Open-address hashing • The insertion scheme just described uses open-address hashing • In open addressing, collisions are resolved by placing a new item in the next open spot in the array • Scheme requires that the key field of each array element be initialized to some known value; -1, for example hashing
In order to insert a new record, the key must somehow be converted toan array index. The index is called the hash valueof the key. Number 281942902 Number 233667136 Number 506643548 Number 155778322 . . . Inserting a New Record Number 580625685 [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] hashing
Typical hash function 701 is the number of items in the array Number is the original key value Number 281942902 Number 233667136 Number 506643548 Number 155778322 . . . Inserting a New Record Number 580625685 (Number mod 701) 3 What is (580625685 mod 701) ? [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] hashing
The hash value is used for the location of the new record. [3] Number 281942902 Number 233667136 Number 506643548 Number 155778322 . . . Inserting a New Record [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] hashing
Here is another new record to insert, with a hash value of 2. Number 281942902 Number 233667136 Number 580625685 Number 506643548 Number 155778322 . . . Collisions Number 701466868 My hash value is [2]. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] hashing
This is called a collision, because there is already another valid record at [2]. Number 281942902 Number 233667136 Number 580625685 Number 506643548 Number 155778322 . . . Collisions Number 701466868 When a collision occurs, move forward until you find an empty spot. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] hashing
Number 281942902 Number 233667136 Number 701466868 Number 580625685 Number 506643548 Number 155778322 . . . Collisions The new record goes in the empty spot. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] hashing
Operations on a hash table • Retrieving an item • calculate hash value based on desired key • search array, beginning at calculated index, for desired data • search is finished when: • item is found; successful search • an empty index is encountered; unsuccessful search hashing
The data that's attached to a key can be found fairly quickly. Number 281942902 Number 233667136 Number 701466868 Number 580625685 Number 506643548 Number 155778322 . . . Searching for a Key Number 701466868 [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] hashing
Calculate the hash value. Check that location of the array for the key. Number 281942902 Number 233667136 Number 701466868 Number 580625685 Number 506643548 Number 155778322 . . . Searching for a Key Number 701466868 My hash value is [2]. Not me. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] hashing
Keep moving forward until you find the key, or you reach an empty spot. Number 281942902 Number 233667136 Number 701466868 Number 580625685 Number 506643548 Number 155778322 . . . Searching for a Key Number 701466868 My hash value is [2]. Not me. Not me. Yes! [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] hashing
When the item is found, the information can be copied to the necessary location. Number 281942902 Number 233667136 Number 701466868 Number 580625685 Number 506643548 Number 155778322 . . . Searching for a Key Number 701466868 My hash value is [2]. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] hashing
Operations on a hash table • Deleting an item: • find index based on hashed key, as with insertion and retrieval • mark record at index to indicate the spot is open • can’t use ordinary “empty” designation -- this could interfere with record retrieval • use alternative “open” designation: indicate the slot is open for insertion, but won’t stop a search hashing
Number 281942902 Number 233667136 Number 701466868 Number 580625685 Number 506643548 Number 155778322 . . . Deleting a Record • Records may also be deleted from a hash table. • But the location must not be left as an ordinary "empty spot" since that could interfere with searches. • The location must be marked in some special way so that a search can tell that the spot used to have something in it. Please delete me. [ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700] hashing
A class specification for a hashing dictionary • Public functions: • constructor: creates and initializes empty dictionary • insert: inserts a new item • is_present: returns true if specified item is found in dictionary, false if not • find: returns a copy of the desired item, if found • remove: removes specified record if it exists • size: returns total number of records in dictionary hashing
Invariant for dictionary class • Member variable used stores the number of records currently in dictionary • Member variable data is an array of CAPACITY entries; actual records are stored here • Each valid record has a non-negative key value; an unused record has its key field set to the constant NEVER_USED or the constant PREVIOUSLY_USED hashing
Code for dictionary class template <class RecType> class Dictionary { public: enum {CAPACITY = 811}; Dictionary( ); void insert (const RecType& entry); void remove (int key); bool is_present(int key) const; void find (int key, bool& found, RecType& result) const; size_t size( ) const {return used;} hashing
Code for dictionary class … private: const int NEVER_USED = -1; const int PREVIOUSLY_USED = -2; RecType data[CAPACITY]; size_t used; … hashing
Helper functions in dictionary class • hash: calculates hash value for given key • next_index: steps through array, providing wrap-around function at end of array • find_index: finds array index of record with given key • never_used: returns true if index has never been used • is_vacant: returns true if index is not currently in use hashing
Code for dictionary class ... // helper functions: size_t hash (int key) const {return key%CAPACITY;} size_t next_index (size_t index) const {return (index+1)%CAPACITY;} void find_index (int key, bool& found, size_t& index) const; bool never_used (size_t index) const {return data[index].key == NEVER_USED;} bool is_vacant(size_t index) const {return data[index].key < 0;} }; hashing
Function implementations // constructor template <class RecType> Dictionary<RecType>::Dictionary( ) { used = 0; for (int x=0; x<CAPACITY; x++) data[x].key = NEVER_USED; } hashing
Function implementations // helper function find_index template <class RecType> void Dictionary<RecType>::find_index(int key, bool& found, size_t& index) { size_t count=0; index = hash(key); while ((count < CAPACITY) && (!never_used(index)) && (data[index].key != key)) { count++; index = next_index(index); } found = (data[index].key == key); } hashing
Function implementations template <class RecType> void Dictionary<RecType>::insert (const RecType& entry) { bool already_present; // true if entry already in table size_t index; // location of new entry find_index(entry.key, already_present, index); if (!already_present) { assert (size( ) < CAPACITY); used++; data[index] = entry; } } hashing
Function implementations template <class RecType> void Dictionary<RecType>::remove (int key) { bool found; // true if key occurs somewhere in table size_t index; // index of key value assert (key >= 0); // must be valid key find_index(key, found, index); if (found) { data[index].key = PREVIOUSLY_USED; used--; } } hashing
Function implementations template <class RecType> bool Dictionary<RecType>::is_present(int key) { bool found; size_t index; assert (key >= 0); find_index (key, found, index); return found; } hashing
Function implementations template <class RecType> void Dictionary<RecType>::find(int key, bool& found, RecType& result) const { size_t index; assert (key >= 0); find_index(key, found, index); if (found) result = data[index]; } hashing