1 / 15

Hashing

Hashing. CS 105. Hashing - Introduction. In a dictionary, if it can be arranged such that the key is also the index to the array that stores the entries, searching and inserting items would be very fast Example: Empdata[1000], index = employee ID number

gyda
Download Presentation

Hashing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hashing CS 105

  2. Hashing - Introduction • In a dictionary, if it can be arranged such that the key is also the index to the array that stores the entries, searching and inserting items would be very fast • Example: Empdata[1000], index = employee ID number • search for employee with emp. number = 500 • return: Empdata[500] • Running Time: O(1)

  3. Hash table • Hash table: a data structure, implemented as an array of objects, where the search keys correspond to the array indexes • Insert and find operations involve straightforward array accesses: O(1) time complexity

  4. About hash tables • In the first example shown, it was relatively easy since employee number is an integer • Problem 1: possible integer key values might be too large; creating an appropriate array might be impractical • Need to map large integer values to smaller array indexes • Problem 2: what if the key is a word in the English Alphabet (e.g. last names)? • Need to map names to integers (indexes)

  5. Large numbers -> small numbers • Hash function - converts a number from a large range into a number from a smaller range (the range of array indices) • Size of array • Rule of thumb: the array size should be about twice the size of the data set (2s) • for 50,000 words, use an array of 100,000 elements

  6. Hash function and modulo • Simplest hash function - achieved by using the modulo function (returns the remainder) • for example, 33 % 10 = 3 • General formula:LargeNumber % Smallrange

  7. Hash functions for names • Sum of Digits Method • map the alphabet A-Z to the numbers 1 to 26 (a=1,b=2,c=3,etc.) • add the total of the letters • For example, “cats” • (c=3,a=1,t=20,s=19) • 3+1+20+19=43 • ”cats” will be stored using index = 43 • Can use modulo operation (%) if you need to map to a smaller array

  8. Collisions • Problem • Too many words with the same index • “was”,”tin”,”give”,”tend”,”moan”,”tick” and several other words add to 43 • These are called collisions(case where two different search keys hash to the same index value) • Can occur even when dealing with integers • Suppose the size of the hash table is 100 • Keys 158 and 358 hash to the same value when using the modulo hash function

  9. Collision resolution policy • Need to know what to do when a collision occurs; i.e., during an insert operation, what if the array slot is already occupied? • Most common policy: go to the next available slot • “Wrap around” the array if necessary • Consequence: when searching, use the hash function but first check whether the element is the one you are looking for. If not try the next slots. • How do you know if the element is not in the array?

  10. Probe sequence • Sequence of indexes that serve as array slots where a key value would map to • The first index in the probe sequence is the home position, the value of the hash function. The next indexes are the alternative slots • Example: suppose the array size is 10, and the hash function is h(K) = K%10. The probe sequence for K=25 is: • 5, 6, 7, 8, 9, 0, 1, 2, 3, 4 • Here, we assume the most common collision resolution policy of going to the next slot:p(K,i) = i, • Goal: probe sequence should exhaust array slots

  11. Recap: hash table operations • Insert object Obj with key value K • home <- h(K)for i <- 0 to M-1 do pos = (home + p(K,i)) % 10 if HT[pos].getKey() = K then throw exception “error: duplicate record” // alternative: overwrite else if HT[pos] is null then HT[pos] <- Obj break; • Finding an object with key value K • home <- h(K)for i <- 0 to M-1 do pos = (home + p(K,i)) % 10 if HT[pos].getKey() = K then return HT[pos] else if HT[pos] is null then throw exception “not found”

  12. Hash table operations • Note: although insert and find run in O(1) time during typical conditions, the time complexity in the worst-case is O(n) • Something to think about: characterize the worst-case scenarios for insert and find

  13. Removing elements • Removing an element from a hash table during a delete operation poses a problem • If we set the corresponding hash table entry to null, then succeeding find operations might not work properly • Recall that for the find algorithm, seeing a null means a target element is not found but in fact the element might be in a next slot • Solution: tombstone • Arrange it so that deleted entries seem null when inserting, but don’t seem null when searching • Requires a simple flag on the objects stored

  14. Hash tables in Java • java.util.Hashtable • Important methods for the Hashtable class • put(Object key, Object entry) • Object get(Object key) • remove(Object key) • boolean containsKey(Object key)

  15. Summary • Hash tables implement the dictionary data structure and enable O(1) insert, find, and remove operations • Caveat: O(n) in the worst-case because of the possibility of collisions • Requires a hash function (maps keys to array indices) and a collision resolution policy • Probe sequence depicts a sequence of array slots that an object would occupy, given its key • In Java: use the Hashtable class

More Related