1 / 18

Data Structures CS 214 2 nd Term 2010-2011

Cairo University. Faculty of Computers and Information. Data Structures CS 214 2 nd Term 2010-2011. 19 – Hashing and Hash Tables. Chapter 10 in Adam Drozdek. No Slides on Hash Tables. Read the covered topics from the book Chapter 10 from 10.1 to 10.4 Exclude 10.2.3 Bucket Hashing

ncleo
Download Presentation

Data Structures CS 214 2 nd Term 2010-2011

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cairo University Faculty of Computers and Information Data Structures CS 214 2nd Term 2010-2011 19 – Hashing and Hash Tables Chapter 10 in Adam Drozdek

  2. No Slides on Hash Tables • Read the covered topics from the book • Chapter 10 from 10.1 to 10.4 • Exclude 10.2.3 Bucket Hashing • Exclude 10.4.2 FHCD Algorithm • In sorting algorithms we take: Quick, Merge, Heap, Insertion, Selection and Bubble sort. • Cancel Shell Sort

  3. Good Slides but do not replace reading the book • http://courses.cs.vt.edu/~cs3114/Fall10/Notes/T15.HashTables.pdf • http://courses.cs.vt.edu/~cs3114/Fall10/Notes/T16.HashFunctions.pdf

  4. Introduction to Perfect Hashing Schemes • Perfect Hash Functions • Perfect Hashing: An Example using Cichelli’s Method • Applications of Hashing

  5. Perfect Hash Functions • The hash tables we have seen so far allow the dynamic insertion and removal of items. • Possibility of collisions cannot be ruled out in such schemes. A new key may collide with an existing one. • Can we rule out the possibility of collisions if we know more about the items to be loaded? • A perfect hash functionis a one-to-one mapping that guarantees absence of collisions, given that all the keys are known beforehand. • A perfect hash function that wastes no table space is said to be minimal perfect.

  6. A Perfect Hash Function for Strings • R. J. Cichelli gave an algorithm for finding perfect hash functions for strings. • It is a search algorithm in the function space to find a good hashing function. • He proposes the hash function: h(s)=(s.length() + g(s[0]) + g(s[size-1]) % n • The function g is to be constructed so that h(s) is unique for each string s. • For this to be a perfect hash function, the proper mapping of letters to integers is needed.

  7. Perfect Hashing: Outline of Cichelli's Algorithm • Given a fixed collection of words, the Cichelli's algorithm proceeds thus: 1. Find the frequency of the first and the last letter of each word; 2.Then find the sum of the frequencies of the first and the last letter of each word; 3. Sort the words in descending order of frequency; 4. Go to the next word (select the next word from step 3); 5. Choose g-values for any unassigned first/last letters for the current word. If a conflict occurs, backtrack and choose again. 6. If there are more words to process, go to Step 4.

  8. Example 1: Illustrating Perfect Hashing • Use Cichelli's algorithm to build a minimal perfect hash function for the following nine strings: DO DOWNTO ELSE END IF IN TYPE VAR WITH

  9. Example 1: Solution • For Step 1 in the algorithm, we find the frequencies of the first and last letter of each word to find: D O E I F N T V R W H 3 2 4 2 1 1 1 1 1 1 1 • Next we find the sum of the first and last letter of each word: DO=5(D+0=3+2), DOWNTO=5, ELSE = 8, END=7, IF=3, IN=3, TYPE=5, VAR=2,WITH=2 • Sorting the keywords in decreasing frequency yields: ELSE END DOWNTO DO TYPE IN IF VAR WITH • We are now at step 5 of the algorithm, the heart of the algorithm. We try the words in frequency order:

  10. Example 1: Cichelli's Method (cont'd) s = ELSE g(E)=0 h(s) = s.length()+g(E)+g(E)=4 {4} s = END g(D) = 0 h(s) = s.length()+g(E)+g(D)=3 {3,4} s = DOWNTO g(O) = 0 h(s)= 6 {3,4,6} s = DO h(s) = s.length()+g(D)+g(O) = 2 {2,3,4,6} s = TYPE g(T) = 0 h(s)= 4* {2,3,4,6} s = TYPE g(T)=1 h(s) = s.length()+g(T)+g(E) =5 {2,3,4,5,6}

  11. Example 1: Cichelli's Method (cont'd) s=IN g(I)=0,g(N)=0 h(s)= s.length()+g(I)+g(N)=2*{23456} s=IN g(I)=1,g(N)=0 h(s)=s.length()+g(I)+g(N)=3* {23456} s=IN g(I)=2,g(N)=0 h(s)=s.length()+g(I)+g(N)=4* {23456} s=IN g(I)=3,g(N)=0 h(s)=s.length()+g(I)+g(N)=5* {23456} s=IN g(I)=3,g(N)=1 h(s)=s.length()+g(I)+g(N)=6* {23456} s=IN g(I)=3,g(N)=2 h(s)=s.length()+g(I)+g(N)=7 {234567}

  12. Example 1: Cichelli's Method (cont'd) • s=IF g(F)=0 h(s)=s.length()+g(I)+g(F)=5* {234567} • s=IF g(F)=1 h(s)=s.length()+g(I)+g(F)=6* {234567} • s=IF g(F)=2 hash(s)=s.length()+g(I)+g(F)=7* {234567} • s=IF g(F)=3 h(s)=s.length()+g(I)+g(F)=8 {2345678} • The steps for VAR and WITH are left an an exercise. • You should get V=R=W=H=3, h(VAR)=0 and h(WITH)=1.

  13. Example 1: Cichelli's Algorithm (cont'd) • With the g-values E = D = O = 0,T = 1,N = 2,I = F = V = R = W = H = 3, h is minimal perfect. • Based on these g-values the strings will be stored as shown below: 0 1 2 3 4 5 6 7 8 DOWNTO ELSE TYPE IN IF VAR WITH DO END • The hash table above is fully occupied with empty slots. • Note that if there are empty slots or there is a collision, then the g-value assignments are in error.

  14. Cichelli's Algorithm: Comments • The search process in this algorithm is exponential. • The algorithm is applicable to small sets of strings. • It does not guarantee that a perfect hash function can be found. • Program usually run only once and result incorporated into another program. • There are extensions to this technique that avoid its limitations. • For our purpose in this course, the Cichelli's algorithm is sufficient.

  15. Hashing: A Birthday Surprise! • Collisions occur more frequently than people normally think! • According to the famous Birthday Surprise 'paradox', if there are 24 or more people in a room, there is >50% chance that two or more will have the same birthday. • In other words if records of 24 people are to be loaded into a hash table of size 365, there >50% chance of a collision. • Moreover, when up to 47 records are loaded, the chances are better than 19 out of 20 chances of collisions. • This justifies efforts in search for minimal perfect hash functions!

  16. Applications of Hashing There are many areas where hashing is applicable. Here are common ones: • Databases: Efficient retrieval of records. • Compilers: Symbol tables. • Games: Lookup board configuration to find the move that goes with it. • UNIX shell: Quick command lookup. • IP Routing: Fast IP address lookup.

  17. Exercises 1. In our examples using Cichelli's mehod, we selected g-values from {0,1,2,3} . Explain how the choice of g-values from a bigger set affects the efficiency of the algorithm as compared to its chances of finding a minimal perfect hash function. 2. Use Cichelli's method to build a minimal perfect hash function for the following 11 Java keywords: class extends implements synchronized throws import protected instanceOf return abstract this Assume that g-values must be integers in the set {0,1,2,3} only.

  18. Another Example on Cichelli's Algorithm • http://courses.cs.vt.edu/~cs3114/Fall10/Notes/T17.PerfectHashFunctions.pdf

More Related