1 / 20

Hashing – Part I

Hashing – Part I. CS 367 – Introduction to Data Structures. Searching. Up to now the only way to find a key is to search through all or part of the data linked list: O(n) AVL tree: O(log n) binary search of array: O(log n)

mirit
Download Presentation

Hashing – Part I

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hashing – Part I CS 367 – Introduction to Data Structures

  2. Searching • Up to now the only way to find a key is to search through all or part of the data • linked list: O(n) • AVL tree: O(log n) • binary search of array: O(log n) • If lots of data and/or searching the data very often, these times can be long • given the key, would like to get the data directly

  3. Hashing • The solution to this problem is to put the key through a function that says exactly where the data is (or where it should be placed) • this function is called a hash function • h(key) = integer • the integer obtained from a hash function can be used as an index into an array • if the hash function is perfect – always generates a unique integer for different keys – the time to place and access data is O(1)

  4. Hashing A M X Hashing Function A M X 0 1 2 3 4 5 6 7 8 9 10 11

  5. Hashing Functions • So what is the hashing function? • the simplest hashing function is to use the division remainder • assume the array is 1000 elements in size • translate the data into a number, n • h(n) = n % 1000

  6. Hashing Functions • simple example • consider a small school • each student is tracked by a 4 digit ID number • each students ID# begins with the year they started • 2000 -> 0, 2001->1, 2002->2, etc. • all student records are stored in an array • maximum of 1000 students per year • let’s look at records for all sophomores • assume they were freshman in 2001

  7. Hashing Functions To find John’s record in the array: 1009 % 1000 = 9 Go to index number 9. Mary’s ID #: 1000 Pete’s ID #: 1004 John’s ID #: 1009 Amy’s ID#: 1011 0 1 2 3 4 5 6 7 8 9 10 11 … Mary’s records Pete’s records John’s records Amy’s records

  8. Generating n • The previous example is rather simplistic in that it is hashing already unique integers • seems kind of pointless • maybe not if the integers are large • consider the UW’s 10 digit ID numbers • Often it is desirable to hash some other kind of data • a person’s name for example

  9. Generating n • How is a string converted into an integer? • the simplest method is to add all of the ASCII values for each character together • example • convert amy into an integer • a = 97; m = 109; y = 121 • a + m + y = 327 • there are lots of other ways to convert strings to integers • what are a few of them?

  10. Hashing Functions • There are millions of possible hashing functions • we will not be considering them all • basically, anything you can think of to generate an integer could be used as a hashing function • Mathematicians have spent lots of time and effort to come up with some basic methods that work pretty well

  11. Division • We have already seen the division method • it involves taking the remainder of division • h(key) = key % tableSize • A few notes about making this work better • table size should be a prime number • usually a good method if nothing very little is known about the keys • the remaining methods will all use division as the final step in their calculation

  12. Folding • Separate the key into various equally sized parts and then recombine them • usually with addition • Two kinds of folding • shift folding • just add the various parts together as they are • boundary folding • reverse the order of every other part and add them together

  13. Folding • Consider a SSN as a key • break it into 3 parts • first 3, second 3, last 3 • Shift folding example • SSN = 123-45-6789 • first = 123; second = 456; third = 789 • h(key) = (first + second + third) % size • h(SSN) = 1368 % tableSize • Boundary folding example • h(key) = (first + R(second) + third) % size • h(key) = (123 + 654 + 789) % size

  14. Increasing Performance • Consider using shifting and exclusive OR’ing to generate the key • exclusive OR parts together to generate index • Example • consider the string abcdefgh • if each part is a letter, just exclusive OR them • ‘a’ ^ ‘b’ ^ ‘c’ ^ ‘d’ ^ ‘e’ ^ ‘f’ ^ ‘g’ ^ ‘h’ • often, a character is represented by 8 bits • what’s the problem with this? • might be better to exclusive OR chunks of the string • “abcd” ^ “efgh” • why were four digits chosen in this case?

  15. Increasing Performance int shiftFold(String key, int tableSize) { int chunk = 0; int result = 0; byte[ ] st = key.getBytes(); for(int i=0; i<st.length; i+=4) { for(int j=0; (j<4) && (j + i < st.length); j++) { chunk = chunk | st[j + i]; chunk = chunk << 8; } result = result ^ chunk; chunk = 0; } return result % tableSize; }

  16. Increasing Performance • The performance could be increased even more if the table size was a power of 2 • can get rid of the modulo operation at the end • modulo is an expensive calculation • could just do a subtraction and an AND operation instead

  17. Mid-Square Function • Square the number and take the middle part as the index • a string must first be converted to get the number to square • The entire key gets used to generate the address • less chance for conflicts • more on this later • This method works best if the table size is a power of two

  18. Mid-Square Function • Table size equals 1024 (210) • The key is 3121 • 31212 = 9740441 = (100101001010000101100001)2 • middle 10 digits of this value are listed in bold • Index in array is • (0101000010)2 = 322 • This is all very quick and easy to calculate using mask and shift operations

  19. Mid-Square Function int tableSize = 1024; int mask = (tableSize – 1) ; int maskBits = logBase2(tableSize); int shiftBits = 7; // table size must be a power of two int midSquare(String key, int tableSize) { int n = stringToNum(key); int n = n * n; return n & (mask << shiftBits); }

  20. Extraction • Simply pull out a certain part of the key and use it as the index • example • SSN = 123-45-6789 • index = middle of key = 456 • alternative index = first, middle, last = 159 • Should try to choose a part of the key that is most likely unique • consider foreign student SSN • start with 999 • probably not a great idea to extract the first three numbers

More Related