1 / 14

Cs212: Data Structures

Cs212: Data Structures. Lecture 10:Hashing. Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called Hashing or Hash addressing which is essentially independent of the number n.

saki
Download Presentation

Cs212: Data Structures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cs212: Data Structures Lecture 10:Hashing

  2. Hashing • The search time of each algorithm depend on the number n of elements of the collection S of the data. • A searching technique called Hashing or Hash addressing which is essentially independent of the number n. • Hashing uses a data structure called a hash table. Although hash tables provide fast insertion, deletion, and retrieval, operations that involve searching, such as finding the minimum or maximum value, are not performed very quickly. • It is also used in many encryption algorithms.

  3. Hash Table • Hash Table is a data structure in which keys are mapped to array positions by a hash function. This table can be searched for an item in fast time using a hash function to form an address from the key. • Hash Function is a function which, when applied to the key, produces an integer which can be used as an address in a hash table. • Perfect hash function • Good hash function • When more than one element tries to occupy the same array position, we have a collision. • Collision is a condition resulting when two or more keys produce the same hash location.

  4. Hash Table • Comparison of keys was the main operation used by the previous discussed searching methods . • There is a different way of searching by calculates the position of the key based on the value of the key. • We need to find a function h that can transfer a key K (string, number, record, etc..) into an index the a table used for storing items of the same type as K. • This function is called hash function.

  5. Example: Suppose we want to store a sequence of randomly generated numbers, keys: 5, 17, 37, 20, 42, 3. The array A, the hash table, where we want to store the numbers: 0 1 2 3 4 5 6 7 8 | | | | | | | | | | We need a way of mapping the numbers to the array indexes, a hash function, that will let us store the numbers and later recompute the index when we want to retrieve them. There is a natural choice for this.

  6. Example: • Our hash table has 9 fields and the mod function, which sends every integer to its remainder modulo 9, will map an integer to a number between 0 and 8. 5 mod 9 = 5 17 mod 9 = 8 37 mod 9 = 1 20 mod 9 = 2 42 mod 9 = 6 3 mod 9 = 3 We store the values: | | 37 | 20 | 3 | | 5 | 42 | | 17 | In this case, computing the hash value of the number n to be stored: n mod 9, costs a constant amount of time. And so does the actual storage, because n is stored directly in an array field.

  7. Hash Functions 1. Division • A hash function must guarantee that the number it returns is a valid index to one of the table entries. • The simplest way is to use division modulo. • TSize=sizeof(table), as in h(K)= K mod TSize. • It is best if TSize is a prime number. • Advantages: • simple • useful if we don't know much about the keys

  8. Hash Functions • Extraction • Idea: use only part of the key to compute the hash value/ address/ index. • Exe: Key is (SSN) 123456789 This method might use for example: the first four digits ( 1234) or the last four (6789), or combined the first two with the last two (1289) to be the index.

  9. Hash Functions 3. Folding • Idea: divide the key into parts, then combine (“fold”) the parts to create the index • The key is divided into several parts. These parts are combined or folded together and are usually transformed in a certain way to create (address) index into the table. • This is done by first dividing the key into parts where each of the parts of the key will be the same length as the desired index • Note: after combining the key parts if the resulted index is grater that the desired length then you can apply either division (which is usually used) or use extraction.

  10. Folding There are two types of folding • Shift folding • The key is divided into several parts then these parts are added together to create the index • Exe:Key is (SSN) 123456789 (SSN) 123-45-6789 can be divided into three parts, 123, 456, 789, and then these parts can be added. The resulting 1,368 can be divided modulo TSize.

  11. Folding • Boundary folding • Same as shift folding, except that every other part is written backwards • Exe:Key is (SSN) 123456789 (SSN) with three parts, 123, 456, 789. the first part is taken in the same order the second part is in reverse order the third pat is in the same order The result is 123+654+789=1,566 , then division • Exe: Key is 23459087632 Boundary folding: 234 + 095 + 876 + 23 = 1228 • This process is simple and fast especially when bit patterns are used instead of numerical values, replace addition in previous examples with XOR

  12. Hash Functions(cont’) • Mid-Square function • Idea: square the key (key is multiplied by itself), then use the “middle (mid) part of the result” as the address. • Note: extraction could be used to extract the mid part. • Exe: Key is 3121 Square the key: (3,121)2 =9,740,641 Then use the mid part as the address (406) Here, for 1,000-cell table, h(3,121)=406

  13. Detecting and resolving collisions • Even with the methods introduced previously, collisions may still occur. • We cannot hash two keys to the same location, so we must find a way to resolve collisions. • Choice of hash function and choice of table size may reduce collisions, but will not eliminate them. • Methods for resolving collisions: • open addressing: find another empty position • chaining: use linked lists • bucket addressing: store elements at same location

  14. End Of Chapter References: Text book, chapter10: Hashing

More Related