140 likes | 232 Views
Searching / Hashing. Big-O of Search Algorithms. Sequential Search - O(n) unsorted list in an array (did not do this term) linked list, even if sorted (gradelnklist files) Binary Search - O(log 2 n) sorted list in an array (gradelistarray files)
E N D
Big-O of Search Algorithms • Sequential Search - O(n) • unsorted list in an array (did not do this term) • linked list, even if sorted (gradelnklist files) • Binary Search - O(log2n) • sorted list in an array (gradelistarray files) • BST if reasonably balanced (tree files) • Hashing - O(1) - constant search time!
Hashing Fundamentals • Records (structs) are stored in an array • Records are not sorted on a particular key • Hash function – calculates the position in the array in which a record is stored based on the key • Ideally, hash function should be one-to-one, i.e., two different keys should not "hash" to the same position
Hashing Fundamentals • To add an item to a hash table, use the hash function to calculate its position and store it directly there • To locate (search for) an item in a hash table, use the hash function to calculate its position and look for it directly there • Unused positions in the hash table need to have a default "empty" value stored
Example 1 Student Records with SSN as Key Hash function: h(ssn) = ssn const int MAXSTUDENTS = 1,000,000,000; struct StudentType { long ssn; string lastname; string firstname; char midinit; float gpa; } StudentType students[MAXSTUDENTS];
Example 1 Pros ? Cons ? a LOT of wasted space this example wastes 99.9999% of array positions • very simple hash function • hash function is one-to-one
Example 2 Student Records with SSN as Key Hash function: h(ssn) = ssn % 10000 const int MAXSTUDENTS = 10,000; struct StudentType { long ssn; string lastname; string firstname; char midinit; float gpa; } StudentType students[MAXSTUDENTS];
Example 2 Pros ? Cons ? still some wasted space, but not as much (only wasting 90% of array positions) hash function is no longer guaranteed to be one-to-one no longer guaranteed O(1) searching • still a relatively simple hash function
Collisions • A collision occurs when two keys hash to the same value • As seen in example 1, a perfect hash function can waste a lot of space, but ... • ... reducing the wasted space can introduce the possibility of collisions! • Want to find optimal array size and hash function to minimize wasted space and minimize collisions
Ways to Handle CollisionsLinear Probing • To insert a record • Start by calculating the hash value • Starting at that position, do sequential search for an empty spot • Store record in empty spot indx = h(insertssn) while (students[indx].ssn != empty value) indx = (indx + 1) % MAXSTUDENTS students[indx] = newstudentrecord
Ways to Handle CollisionsLinear Probing • To locate (search for) a record • Start by calculating the hash value • Starting at that position, do sequential search for the record • If an empty spot is encountered before finding record, record is not there indx = h(searchssn) while (students[indx].ssn != searchssn && students[indx].ssn != empty value) indx = (indx + 1) % MAXSTUDENTS if (students[indx].ssn == searchssn ) found student with searchssn else no student in table with searchssn
Ways to Handle CollisionsChaining • Have each element in the array be the head pointer to a linked list of records whose keys hash to the same value • Slightly better than linear probing - limits the length of the sequential search required once collisions start to occur • Requires more storage than linear probing even if same table size is used because of space required for pointers
Possible Hash Functions • Division Method h(key) = key % MAXSTUDENTS • Folding break key into "pieces" and do calculations with the pieces ex: h(123 45 6321) = 12+34+56+32+1 = 135
For more info • Read pages 647-662 in text • Look at problems 29, 32, 33(only columns for 29 and 32) • Food for thought: Do you think a hash table is a good storage option for a group of records that you want to display in various sorted orders?