1 / 12

Lecture 7 : Hashing

Lecture 7 : Hashing. Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University. Hashing. Enables access (search, insert, delete operations) to table items in time that is relatively constant (O(1)) and independent of the items. Hashing Terminology.

Download Presentation

Lecture 7 : Hashing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 7 : Hashing Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University

  2. Hashing • Enables access (search, insert, delete operations) to table items in time that is relatively constant (O(1)) and independent of the items

  3. Hashing Terminology • hash table • a data structure in which the location of an object is determined by applying a function to its value. • An array that contains the table items, as assigned by a hash function • cell or bucket • a single location in a hash table. • Hashing • applying a function to the value of an object to determine a location.

  4. Hashing Terminology • hash function : h(k) • maps each value to a unique position or location • perfect hash function • Maps each search key into a unique location of the hash table • hash code • integer value computed for an object by applying a hash function • Requirements for a hash function • Be easy and fast to compute • Place items evenly throughout the hash table • Must allow for any regularities in the data • Modulo 2k is very bad if most keys are even!

  5. Hashing Terminology • load factor • the fraction of occupancy in a hash table before it is resized • one-way hash function • is a hash function in which given the result of hashing, it is very hard to guess anything about the original value that was hashed • Collision • when two objects are mapped to the same location in a hash table • Collision-resolution schemes • Assign locations in the hash table to items with different search keys when the items are involved in a collision

  6. Hash Function • We can assume that input parameter is an integer • Simple hash functions for positive integers: • Division (using modulo operator) : h(k) = k % D • D should be a prime number p • Why prime number ? -> see textbook page 398 • Mid-Square • Square the number • Take the middle digits as the hash key • Folding • Partition a key into several parts. • Those partitions are then added together to obtain address • Ex) k=12320324111220 P1=123, P2=203, P3=241, P4=112, P5=20 h(k)=123+203+241+112+20=699

  7. String to Integer • Converting a character string to an integer • If jimthe search key is a character string, it can be converted into an integer before the hash function is applied • If the conversion is to a fixed-length integer, must be careful that the conversion doesn’t introduce collisions unsigned int stringToInt(char *key) { int number=0; while (*key) { number += *key++; if (*key) number += ((int) *key++) << 8; } return number; }

  8. Resolving Collisions • Two approaches to collision resolution • Approach 1: Open addressing • Approach 2: Chaining

  9. Open Addressing • probe for an empty, or open, location in the hash table • Linear probing • Searches the hash table sequentially, starting from the original location specified by the hash function • Quadratic probing • Searches the hash table beginning with the original location that the hash function specifies and continues at increments of 12, 22, 32, and so on • Rehashing • Apply a second (or even third) hash function

  10. Chaining • Each location is a reference to a linked list

  11. Good Hash Function • A good hash function should • Be easy and fast to compute • Scatter the data evenly throughout the hash table • Issues to consider with regard to how evenly a hash function scatters the search keys • How well does the hash function scatter random data? • How well does the hash function scatter nonrandom data? • General requirements of a hash function • The calculation of the hash function should involve the entire search key • If a hash function uses module arithmetic, the base should be prime

  12. For many applications, hashing provides the most efficient implementation • Hashing is not efficient for • Traversal in sorted order • Finding the item with the smallest or largest value in its search key • Range query

More Related