120 likes | 217 Views
Lecture 7 : Hashing. Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University. Hashing. Enables access (search, insert, delete operations) to table items in time that is relatively constant (O(1)) and independent of the items. Hashing Terminology.
E N D
Lecture 7 : Hashing Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University
Hashing • Enables access (search, insert, delete operations) to table items in time that is relatively constant (O(1)) and independent of the items
Hashing Terminology • hash table • a data structure in which the location of an object is determined by applying a function to its value. • An array that contains the table items, as assigned by a hash function • cell or bucket • a single location in a hash table. • Hashing • applying a function to the value of an object to determine a location.
Hashing Terminology • hash function : h(k) • maps each value to a unique position or location • perfect hash function • Maps each search key into a unique location of the hash table • hash code • integer value computed for an object by applying a hash function • Requirements for a hash function • Be easy and fast to compute • Place items evenly throughout the hash table • Must allow for any regularities in the data • Modulo 2k is very bad if most keys are even!
Hashing Terminology • load factor • the fraction of occupancy in a hash table before it is resized • one-way hash function • is a hash function in which given the result of hashing, it is very hard to guess anything about the original value that was hashed • Collision • when two objects are mapped to the same location in a hash table • Collision-resolution schemes • Assign locations in the hash table to items with different search keys when the items are involved in a collision
Hash Function • We can assume that input parameter is an integer • Simple hash functions for positive integers: • Division (using modulo operator) : h(k) = k % D • D should be a prime number p • Why prime number ? -> see textbook page 398 • Mid-Square • Square the number • Take the middle digits as the hash key • Folding • Partition a key into several parts. • Those partitions are then added together to obtain address • Ex) k=12320324111220 P1=123, P2=203, P3=241, P4=112, P5=20 h(k)=123+203+241+112+20=699
String to Integer • Converting a character string to an integer • If jimthe search key is a character string, it can be converted into an integer before the hash function is applied • If the conversion is to a fixed-length integer, must be careful that the conversion doesn’t introduce collisions unsigned int stringToInt(char *key) { int number=0; while (*key) { number += *key++; if (*key) number += ((int) *key++) << 8; } return number; }
Resolving Collisions • Two approaches to collision resolution • Approach 1: Open addressing • Approach 2: Chaining
Open Addressing • probe for an empty, or open, location in the hash table • Linear probing • Searches the hash table sequentially, starting from the original location specified by the hash function • Quadratic probing • Searches the hash table beginning with the original location that the hash function specifies and continues at increments of 12, 22, 32, and so on • Rehashing • Apply a second (or even third) hash function
Chaining • Each location is a reference to a linked list
Good Hash Function • A good hash function should • Be easy and fast to compute • Scatter the data evenly throughout the hash table • Issues to consider with regard to how evenly a hash function scatters the search keys • How well does the hash function scatter random data? • How well does the hash function scatter nonrandom data? • General requirements of a hash function • The calculation of the hash function should involve the entire search key • If a hash function uses module arithmetic, the base should be prime
For many applications, hashing provides the most efficient implementation • Hashing is not efficient for • Traversal in sorted order • Finding the item with the smallest or largest value in its search key • Range query