440 likes | 575 Views
Backup Slides. An Example of Hash Function Implementation. struct MyStruct { string str ; string item; }; --------------------------------------------------------- // The hash function takes key “ obj.str ” to index of bucket int hash( const MyStruct & obj ) { int product = 1;
E N D
An Example of Hash Function Implementation structMyStruct { string str; string item; }; --------------------------------------------------------- // The hash function takes key “obj.str” to index of bucket int hash( const MyStruct & obj ) { int product = 1; int modulus = 0; for ( inti = 0; i < 3 && i < int( obj.str.length( ) ); i++ ) product *= (obj.str[ i ]-64); modulus = product % SIZE1; return modulus; }
Uniform Hashing • When the elements are spread evenly (or near evenly) among the indexes of a hash table, it is called uniform hashing • If elements are spread evenly, such that the number of elements at an index is less than some small constant, uniform hashing allows a search to be done in ( 1 ) time • The hash function largely determines whether or not we will have uniform hashing
Bad Hash Functions • h( k ) = 5 is obviously a bad hash function • h( k ) = k % 100 could be a bad hash function if there is meaning attached to parts of a key • Consider that the key might be an employee id • The last two digits may give the state of birth
Ideal Hash Function for Uniform Hashing • The hash table size should be a prime number that is not too close to a power of 2 • 31 is a prime number but is too close to a power of 2 • 97 is a prime number not too close to a power of 2 • A good hash function might be: h( k ) = k % 97
Hash Functions Can be Made for Keys that are Strings 1 int sum = 0; 2 for ( inti = 0; i < int( str.length( ) ); i++ ) 3 sum += str[ i ]; 4 hash_index = sum % 97;
Speed vs. Memory Conservation • Speed comes from reducing the number of collisions • In a search, if there are no collisions, the first element in the linked list in the one we want to find (fast) • Therefore, the greatest speed comes about by making a hash table much larger than the number of keys (but there will still be an occasional collision)
Speed vs. Memory Conservation (cont.) • Each empty LinkedList object in a hash table wastes 8 bytes of memory (4 bytes for the start pointer and 4 bytes for the current pointer) • The best memory conservation comes from trying to reduce the number of empty LinkedList objects • The hash table size would be made much smaller than the number of keys (there would still be an occasional empty linked list)
Hash Table Design • Decide whether speed or memory conservation is more important (and how much more important) for the application • Come up with a good table size which • Allows for the use of a good hash function • Strikes the appropriate balance between speed and memory conservation
Ideal Hash Tables • Can we have a hash function which guarantees that there will be no collisions? • Yes: h( k ) = k • Each key k is unique; therefore, each index produced from h( k ) is unique • Consider 300 employees that have a 4 digit id • A hash table size of 10000 with the hash function above guarantees the best possible speed
Ideal Hash Tables (cont.) • Should we use LinkedList objects if there are no collisions? • Suppose each Employee object takes up 100 bytes • An array size of 10000 Employee objects with only 300 used indexes will have 9700 unused indexes, each taking up 100 bytes • Best to use LinkedList objects (in this case) – the 9700 unused indexes will only use 8 bytes each • Additional space can be saved by not storing the employee id in the object (if no collisions)
Ideal Hash Tables (cont.) • Can we have a hash table without any collisions and without any empty linked lists? • Sometimes. Consider 300 employees with id’s from 0 to 299. We can make a hash table size of 300, and use h( k ) = k • LinkedList objects wouldn’t be necessary and in fact, would waste space • It would also not be necessary to store the employee id in the object
Implementing aHash Table • We’ll implement a HashTable with linked lists (chaining) • without chaining, a hash table can become full • If the client has the ideal hash table mentioned on the previous slide, he/she would be better off to just use an Array for the hash table
Implementing a Hash Function • We shouldn’t write the hash function • The client should write the hash function that he/she would like to use • Then, the client should pass the hash function that he/she wrote as a parameter into the constructor of the HashTable class • This can be implemented with function pointers
Function Pointers • A function pointer is a pointer that holds the address of a function • The function can be called using the function pointer instead of the function name
Function Pointers (cont.) • Example of a function pointer declaration: float (*funcptr) (string);
Function Pointers (cont.) • Example of a function pointer declaration: float (*funcptr) (string); funcptr is the name of the pointer; the name can be chosen like any other pointer name
Function Pointers (cont.) • Example of a function pointer declaration: float (*funcptr) (string); The parentheses are necessary.
Function Pointers (cont.) • Example of a function pointer declaration: float (*funcptr) (string); The return type of the function that funcptr can point to is given here (in this case, the return type is a float)
Function Pointers (cont.) • Example of a function pointer declaration: float (*funcptr) (string); The parameter list of a function that funcptr can point to is given here – in this case, there is only one parameter of string type.
Function Pointers (cont.) • Example of a function pointer declaration: float (*funcptr) (string); • What would a function pointer declaration look like if the function it can point to has a void return type and accepts two integer parameters?
Function Pointers (cont.) void (*fp) (int, int);
Function Pointers (cont.) void (*fp) (int, int); void foo( int a, int b ) { cout << “a is: “ << a << endl; cout << “b is: “ << b << endl; } A function that fp can point to
Assigning the Address of a Function to a Function Pointer void (*fp) (int, int); void foo( int a, int b ) { cout << “a is: “ << a << endl; cout << “b is: “ << b << endl; } fp = foo; The address of foo is assigned to fp like this
Calling a Function by Using a Function Pointer void (*fp) (int, int); void foo( int a, int b ) { cout << “a is: “ << a << endl; cout << “b is: “ << b << endl; } fp( 5, 10 ); Once the address of foo has been assigned to fp, the foo function can be called using fp like this
Design of theHashTable Constructor • Once the client designs the hash function, the client passes the name of the hash function, as a parameter into the HashTable constructor • The HashTable constructor accepts the parameter using a function pointer in this parameter location • The address of the function is saved to a function pointer in the private section • Then, the hash table can call the hash function that the client made by using the function pointer
HashTable.h 1 #include "LinkedList.h" 2 #include "Array.h“ 3 4 template <class DataType> 5 class HashTable 6 { 7 public: 8 HashTable( int (*hf)(const DataType &), int s ); 9 bool insert( const DataType & newObject ); 10 bool retrieve( DataType & retrieved ); 11 bool remove( DataType & removed ); 12 bool update( DataType & updateObject ); 13 void makeEmpty( ); HashTable.h continued…
HashTable.h Space is necessary here 14 private: 15 Array< LinkedList<DataType> > table; 16 int (*hashfunc)(const DataType &); 17 }; 18 19 #include "HashTable.cpp"
Clientele • The LinkedList class is being used in the HashTable class, along with the Array class • Note that when one writes a class the clientele extends beyond the main programmers who might use the class • The clientele extends to people who write other classes
HashTable Constructor 1 template <class DataType> 2 HashTable<DataType>::HashTable( 3 int (*hf)(const DataType &), int s ) 4 : table( s ) 5 { 6 hashfunc = hf; 7 } This call to the Array constructor creates an Array of LinkedList’s of type DataType
HashTable Constructor(cont.) 1 template <class DataType> 2 HashTable<DataType>::HashTable( 3 int (*hf)(const DataType &), int s ) 4 : table( s ) 5 { 6 hashfunc = hf; 7 } The DataType for Array is LinkedList<DataType> (DataType in Array is different than DataType in HashTable)
HashTable Constructor(cont.) 1 template <class DataType> 2 HashTable<DataType>::HashTable( 3 int (*hf)(const DataType &), int s ) 4 : table( s ) 5 { 6 hashfunc = hf; 7 } In the Array constructor, an Array of size s is made, having LinkedList elements – when this array is created, the LinkedList constructor is called for each element.
HashTable Constructor(cont.) 1 template <class DataType> 2 HashTable<DataType>::HashTable( 3 int (*hf)(const DataType &), int s ) 4 : table( s ) 5 { 6 hashfunc = hf; 7 }
insert 8 template <class DataType>8 9 bool HashTable<DataType>::insert( 10 const DataType & newObject ) 11 { 12 int location = hashfunc( newObject ); 13 if ( location < 0 || location >= table.length( ) ) 14 return false; 15 table[ location ].insert( newObject ); 16 return true; 17 } Keep in mind that this is a LinkedList object.
retrieve 18 template <class DataType> 19 bool HashTable<DataType>::retrieve( 20 DataType & retrieved ) 21 { 22 int location = hashfunc( retrieved ); 23 if ( location < 0 || location >= table.length( ) ) 24 return false; 25 if ( !table[ location ].retrieve( retrieved ) ) 26 return false; 27 return true; 28 }
remove 29 template <class DataType> 30 bool HashTable<DataType>::remove( 31 DataType & removed ) 32 { 33 int location = hashfunc( removed ); 34 if ( location < 0 || location >= table.length( ) ) 35 return false; 36 if ( !table[ location ].remove( removed ) ) 37 return false; 38 return true; 39 }
update 40 template <class DataType> 41 bool HashTable<DataType>::update( 42 DataType & updateObject ) 43 { 44 int location = hashfunc( updateObject ); 45 if ( location < 0 || location >= table.length( ) ) 46 return false; 47 if ( !table[location].find( updateObject ) ) 48 return false; 49 table[location].replace( updateObject ); 50 return true; 51 }
makeEmpty 50 template <class DataType> 51 void HashTable<DataType>::makeEmpty( ) 52 { 53 for ( int i = 0; i < table.length( ); i++ ) 54 table[ i ].makeEmpty( ); 55 }
Using HashTable 1 #include <iostream> 2 #include <string> 3 #include "HashTable.h" 4 5 using namespace std; 6 7 struct MyStruct { 8 string str; 9 int num; 10 bool operator ==( const MyStruct & r ) { return str == r.str; } 11 }; str will be the key
Using HashTable(cont.) 1 #include <iostream> 2 #include <string> 3 #include "HashTable.h" 4 5 using namespace std; 6 7 struct MyStruct { 8 string str; 9 int num; 10 bool operator ==( const MyStruct & r ) { return str == r.str; } 11 }; It is necessary to overload the == operator for the LinkedList functions
Using HashTable(cont.) 1 #include <iostream> 2 #include <string> 3 #include "HashTable.h" 4 5 using namespace std; 6 7 struct MyStruct { 8 string str; 9 int num; 10 bool operator ==( const MyStruct & r ) { return str == r.str; } 11 }; In the actual code, a comment is placed above HashTable, telling the client that this is needed and what is required.
Using HashTable(cont.) 12 const int SIZE1 = 97, SIZE2 = 199; 13 14 int hash1( const MyStruct & obj ); 15 int hash2( const MyStruct & obj ); 16 17 int main( ) 18 { 19 HashTable<MyStruct> ht1( hash1, SIZE1 ), 20 ht2( hash2, SIZE2);
Using HashTable(cont.) 21 MyStruct myobj; 22 23 myobj.str = "elephant"; 24 myobj.num = 25; 25 ht1.insert( myobj ); 26 27 myobj.str = "giraffe"; 28 myobj.num = 50; 29 ht2.insert( myobj ); … // other code using the hash tables …
Using HashTable(cont.) 30 return 0; 31 } 32 33 int hash1( const MyStruct & obj ) 34 { 35 int sum = 0; 36 for ( int i = 0; i < 3 && i < int( obj.str.length( ) ); i++ ) 37 sum += obj.str[ i ]; 38 return sum % SIZE1; 39 }