1 / 15

Introduction to Hashing

Introduction to Hashing. CS 311 Winter, 2013. Dictionary Structure. A dictionary structure has the form: (Key, Data) Dictionary structures are organized in a manner that optimizes search time for the key. Hashing stores dictionary objects in a table where each location has an address.

baeddan
Download Presentation

Introduction to Hashing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Hashing CS 311 Winter, 2013

  2. Dictionary Structure • A dictionary structure has the form: (Key, Data) • Dictionary structures are organized in a manner that optimizes search time for the key. • Hashing stores dictionary objects in a table where each location has an address.

  3. Key to Address • Hashing is called a Key to Address system because the address of a dictionary object is computed directly from the key using a function called the Hash Function. • A good hash function should • Be easy to calculate. • Distribute the objects throughout the table with equal probability. • Minimize collisions.

  4. A Simple Hash Function • An example of a simple hash function for a table of size M (locations 0 to M-1) is: int hash( int key ) { return key % M; } • With a good hash function, the search time is O( 1 ).

  5. Collision Resolution • A collision occurs when two keys result in the same address. • When this happens, we must be able to store the second object in a location that can be quickly found starting from the original hash location. • The two basic approaches to collision resolution are called open hashing (or Separate Chaining) and closed hashing (or Open Addressing).

  6. Open Hashing • Open Hashing means that collisions are resolved by storing the colliding object in a separate area. • In essence, the objects that collide form linked lists, where the head of the list is the original hash location. Thus, the name Separate Chaining. • One variation of open hashing is called Bucket Hashing.

  7. Closed Hashing • In closed hashing, objects that collide are stored within the hash table itself. • This can create an addition problem called a Secondary Collision. • Two general methods to resolve collisions in closed hashing are called Probing and Double Hashing.

  8. Probing • In probing, the hash function becomes: hash( key ) + p( i ) where i is an iteration value and p(0) = 0. • The simplest form of probing is linearprobing where p( i ) = i, for i = 0, 1, 2, … • A problem with linear probing, however, is that it can cause clustering.

  9. Probing II • Another common approach to probing that avoids clustering is called quadratic probing. • In quadratic probing p(i) = i2, for i = 0, 1, 2,… • However, if the table is more than half full or if the table size is not a prime number, it is possible that quadratic probing will not find an open slot even when there is one.

  10. Double Hashing • A problem with probing is that the probe sequence is the same for all colliding keys. • An alternative to probing is double hashing. In this case the hash function is hash1( key ) + i hash2( key ) • If the table size is a prime number M and if R is a prime number less than M, then a good choice for hash2 is: hash2( key ) = R – ( key % R )

  11. Load Factor • The load factor  is defined to be N/M, where N is the number of objects in the table and M is the size of the table. • For open hashing, we want the load factor to be close to 1. • For closed hashing, we want the load factor to be less than 0.5.

  12. Deletions • When deleting an object from a hash table, there are two important considerations. • Deleting an object must not hinder later searches. That is, it must not cut off a chain used for probing. • A slot freed because of a deleting must remain usable. • One solution is to use a tombstone.

  13. Tombstones • A tombstone is special marker that states that a slot is free; however, it used to be part of a chain. • A search encountering a tombstone keeps going. • When inserting and encountering a tombstone, we must continue to the end of the chain before reusing the tombstone to prevent inserting a duplicate value.

  14. Tombstone II • Tombstones do lengthen the size of a chain. • An alternative to a tombstone is the following. When a value is removed, continue down the chain, swapping the free slot with the next value in the chain. • This shortens the chain by one slot and always put the freed slot at the end of the chain.

  15. Rehashing • When a table gets too full or when chains get too long, Rehashing creates another table at least twice as big as the original. • This also requires a new hash function. • Then, starting from slot 0, each value in the original table is hashed (using the new function) into the new table.

More Related