150 likes | 445 Views
Cmps 2133. Computer Science Midwestern State University. Coalesced Hashing. Coalesced hashing is a collision resolution method that uses pointers to connect the elements of a synonym chain. . A hybrid of separate chaining and open addressing.
E N D
Cmps 2133 Computer Science Midwestern State University
Coalesced Hashing • Coalesced hashing is a collision resolution method that uses pointers to connect the elements of a synonym chain. • A hybrid of separate chaining and open addressing. • Linked lists within the hash table handle collisions. • This strategy is effective, efficient and very easy to • implement.
Coalesced Hashing • Coalesced hashing obtains its name from what occurs when we attempt to insert a record with a home address that is already occupied by a record from a chain with a different home address. This situation would occur, for example, if we attempted to insert a record with a home address of s into the hash table. What occurs is that the two chains with records having different home addresses coalesce or grow together.
Coalesced Hashing • In figure to the right, the records with keys X, D, and Y were inserted in the given order into the hash table. A, B, C, and D form one set of synonyms and X and Y form another set. • When X is inserted into the table with coalescing, it must be inserted as the end of the chain that it is coalescing with. Instead of needing only one probe to retrieve X, three are needed. The greater the coalescing the longer he probe chain will be, and as a result, retrieval performance will be degraded. • When record D is now added, it must be inserted at the end of the coalesced chains; we must move over record X from the other chain then to locate D. Synonym chain: with coalescing (The shaded portion indicates portion of the chain in which coalescing has occurred, the thin line represents the insertions on the synonym chain with ras its home address. The thick line represents the insertions on the chain with sas its home address.)
Coalesced Hashing Algorithm for Coalesced Hashing Coalesced hashing originated with Williams [1] and is also referred to as direct chaining.
Variants • Many suggestions have been made for reducing the coalescing of probe chains and thereby lowering the number of retrieval probes which in turn improves performance. The variants may be classified in three ways: • The table organization (whether or not a separate • overflow area is used). • The manner of linking a colliding item into a chain. • The manner of choosing unoccupied locations.
Variants • Coalescing may be reduced by modifying the table organization. • Instead of allocating the entire table space for both overflow records and home address records, the table is divided into a primary area and a overflow area. Primary Overflow (cellar) • The primary area is the address space that the hash function maps into. • The overflow or cellar area contains only overflow records. • The address factor is the ratio of primary area to the total table size – • Address Factor = primary area / total • table size
Variants • For a fixed amount of storage, as the address factor decreases, the cellar size increases, which reduces the coalescing but because the primary area becomes smaller, it increases the number of collisions. • More collisions mean more items requiring multiple retrieval probes. • Vitter [2] determined that an address factor of 0.86 yields nearly optimal retrieval performance for most load factors.
Variants • LISCH • The algorithm given in slide 6 is called Late Insertion Standard Coalesced Hashing (LISCH) since new records are inserted at the end of a probe chain. [ The ‘Standard’ in the name refers to the lack of a cellar. • The variant of that algorithm that uses a cellar is called LICH, Late Insertion Coalesced Hashing.
Variants • Another way of varying the insertion algorithm Changing the way in which we choose a unoccupied location. The unoccupied locations are always chosen from the bottom of the storage area. But the no. of collisions is increased in this way. • Hsaio [3] suggest REISCH (‘R’ stands for ‘Random’), in which a random unoccupied location for the new insertion is chosen. REISCH gives only 1% improvement over EISCH. • BLISCH (‘B’ signifies ‘Bidirectional’) is another method of choosing the overflow location for a collision insertion is to alternate the selection between the top and bottom of the table. • In DCWC (Direct Chaining Without Coalescing), a record not stored at its home address is moved.
Variants Table 1: Mean number of probes for successful lookup (n = 997) for variants of Coalesced Hashing