1 / 14

Searching / Hashing

Searching / Hashing. Big-O of Search Algorithms. Sequential Search - O(n) unsorted list in an array (did not do this term) linked list, even if sorted (gradelnklist files) Binary Search - O(log 2 n) sorted list in an array (gradelistarray files)

kaelem
Download Presentation

Searching / Hashing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Searching / Hashing

  2. Big-O of Search Algorithms • Sequential Search - O(n) • unsorted list in an array (did not do this term) • linked list, even if sorted (gradelnklist files) • Binary Search - O(log2n) • sorted list in an array (gradelistarray files) • BST if reasonably balanced (tree files) • Hashing - O(1) - constant search time!

  3. Hashing Fundamentals • Records (structs) are stored in an array • Records are not sorted on a particular key • Hash function – calculates the position in the array in which a record is stored based on the key • Ideally, hash function should be one-to-one, i.e., two different keys should not "hash" to the same position

  4. Hashing Fundamentals • To add an item to a hash table, use the hash function to calculate its position and store it directly there • To locate (search for) an item in a hash table, use the hash function to calculate its position and look for it directly there • Unused positions in the hash table need to have a default "empty" value stored

  5. Example 1 Student Records with SSN as Key Hash function: h(ssn) = ssn const int MAXSTUDENTS = 1,000,000,000; struct StudentType { long ssn; string lastname; string firstname; char midinit; float gpa; } StudentType students[MAXSTUDENTS];

  6. Example 1 Pros ? Cons ? a LOT of wasted space this example wastes 99.9999% of array positions • very simple hash function • hash function is one-to-one

  7. Example 2 Student Records with SSN as Key Hash function: h(ssn) = ssn % 10000 const int MAXSTUDENTS = 10,000; struct StudentType { long ssn; string lastname; string firstname; char midinit; float gpa; } StudentType students[MAXSTUDENTS];

  8. Example 2 Pros ? Cons ? still some wasted space, but not as much (only wasting 90% of array positions) hash function is no longer guaranteed to be one-to-one no longer guaranteed O(1) searching • still a relatively simple hash function

  9. Collisions • A collision occurs when two keys hash to the same value • As seen in example 1, a perfect hash function can waste a lot of space, but ... • ... reducing the wasted space can introduce the possibility of collisions! • Want to find optimal array size and hash function to minimize wasted space and minimize collisions

  10. Ways to Handle CollisionsLinear Probing • To insert a record • Start by calculating the hash value • Starting at that position, do sequential search for an empty spot • Store record in empty spot indx = h(insertssn) while (students[indx].ssn != empty value) indx = (indx + 1) % MAXSTUDENTS students[indx] = newstudentrecord

  11. Ways to Handle CollisionsLinear Probing • To locate (search for) a record • Start by calculating the hash value • Starting at that position, do sequential search for the record • If an empty spot is encountered before finding record, record is not there indx = h(searchssn) while (students[indx].ssn != searchssn && students[indx].ssn != empty value) indx = (indx + 1) % MAXSTUDENTS if (students[indx].ssn == searchssn ) found student with searchssn else no student in table with searchssn

  12. Ways to Handle CollisionsChaining • Have each element in the array be the head pointer to a linked list of records whose keys hash to the same value • Slightly better than linear probing - limits the length of the sequential search required once collisions start to occur • Requires more storage than linear probing even if same table size is used because of space required for pointers

  13. Possible Hash Functions • Division Method h(key) = key % MAXSTUDENTS • Folding break key into "pieces" and do calculations with the pieces ex: h(123 45 6321) = 12+34+56+32+1 = 135

  14. For more info • Read pages 647-662 in text • Look at problems 29, 32, 33(only columns for 29 and 32) • Food for thought: Do you think a hash table is a good storage option for a group of records that you want to display in various sorted orders?

More Related