Chapter 5 Hashing

Chapter 5 Hashing • General ideas • Methods of implementing the hash table • Comparison among these methods • Applications of hashing • Compare hash tables with binary search trees

5.1 General Ideas • Hash table is a fixed size (TableSize) array containing keys. • Each key is mapped into some number in the range 0 to TableSize - 1, and placed in the appropriate cells.

5.1 General Ideas • The mapping is called a hash function, which should be simple to compute and should ensure that any two distinct keys get different cells. It should distribute the keys evenly among the cells. • Collision occurs when 2 or more keys are mapped to the same cell.

5.2 Hash Function Simple Hash Function • For numeric keys, one simple hash function is Key mod TableSize,where TableSize is a prime number. • Assume the key value is 9 digits, and there are 2500 keys. To reduce collision, choose the table size so that the load factor is about 50%.

5.2 Hash Function Select TableSize to be 4999, a prime number close to 5000.

5.2 Hash Function Hash by Folding • Partition the key into several parts, usually 3 parts of about equal length. • Partitions are folded over each other and summed. • The remainder of the sum divided by TableSize is the hash value.

5.2 Hash Function • Example (use 2-4-3 folding instead of 3-3-3 to illustrate folding, and set TableSize to 10000 for ease of illustration)

5.2 Hash Function Mid-Square Method • The key is multiplied by itself (squared). • The middle few digits of the result are used as the hash value. • The exact number of digits to be used depends on the size of the table.

5.2 Hash Function • Suppose the key is 12345. • 123452 = 152 399 025 • The middle 3 digits 399 is the hash value. • If TableSize is 200, then 399 mod 200 = 199 is the hash value. • Avoid the situation where the middle digits are zeros.

5.2 Hash Function Character Keys • One simple method to convert keys to numbers is to add up the ASCII values of the characters in the string, e.g., the string HongKong becomes 795 (72+111+110+103+75+111+110+103)

5.2 Hash Function typedef unsigned int Index; /* Fig 5.3 */ Index Hash1(const char *Key, int TableSize) { unsigned int HashVal = 0; while (*Key != '\0') HashVal += *Key++; return HashVal % TableSize; }

5.3 Separate Chaining • Keep a list of all elements that hash to the same value • Example: Hash (X) = X mod 10, with new elements inserted at the end of the list, and the data sequence 0, 4, 9, 16, 25, 36, 49, 64, 81

5.3 Separate Chaining

5.3 Separate Chaining • Type declaration for separate chaining /* Fig 5.7 */ #ifndef _HashSep_H struct ListNode; typedef struct ListNode *Position; struct HashTbl; typedef struct HashTbl *HashTable;

5.3 Separate Chaining HashTable InitializeTable (int TableSize); void DestroyTable (HashTable H); Position Find (ElementType Key, HashTable H ); void Insert (ElementType Key, HashTable H); ElementType Retrieve (Position P); /* Routines such as Delete and MakeEmpty are omitted */ #endif /* _HashSep_H */

5.3 Separate Chaining struct ListNode { ElementType Element; Position Next; }; typedef Position List; struct HashTbl { int TableSize; List *TheLists; };

5.3 Separate Chaining • Initialization routine for separate chaining /* Fig 5.8 */ HashTable InitializeTable (int TableSize) { HashTable H; int i;

5.3 Separate Chaining if (TableSize < MinTableSize) { Error ("Table size too small"); return NULL; } /* Allocate table */ H = malloc (sizeof (struct HashTbl));

5.3 Separate Chaining if (H == NULL) FatalError ("Out of space!!!"); H->TableSize = NextPrime (TableSize); /* Allocate array of lists */ H->TheLists = malloc (sizeof (List) * H-> TableSize); if (H->TheLists == NULL) FatalError ("Out of space!!!");

5.3 Separate Chaining /* Allocate list headers */ for (i = 0; i < H->TableSize; i++) { H->TheLists [i] = malloc (sizeof (struct ListNode)); if (H->TheLists [i] == NULL) FatalError ("Out of space!!!"); else H->TheLists [i]->Next = NULL; }

5.3 Separate Chaining return H; }

5.3 Separate Chaining • Find routine for separate chaining /* Fig 5.9 */ Position Find (ElementType Key, HashTable H) { Position P; List L;

5.3 Separate Chaining L = H->TheLists [Hash (Key, H->TableSize)]; P = L->Next; while (P != NULL && P->Element != Key) /* Probably need strcmp!! */ P = P->Next; return P; }

5.3 Separate Chaining • Insert routine for separate chaining /* Fig 5.10 */ void Insert (ElementType Key, HashTable H) { Position Pos, NewCell; List L;

5.3 Separate Chaining Pos = Find (Key, H); if (Pos == NULL) /* Key is not found */ { NewCell = malloc (sizeof (struct ListNode)); if (NewCell == NULL) FatalError ("Out of space!!!"); else {

5.3 Separate Chaining L = H->TheLists [Hash (Key, H-> TableSize)]; NewCell->Next = L->Next; /* Probably need strcpy! */ NewCell->Element = Key; L->Next = NewCell; } } }

5.3 Separate Chaining • Effort required to perform a search is the constant time required to evaluate the hash function plus the time to traverse the list. • Average list length =  (load factor) • Successful search requires about 1 + /2links to be traversed. • Unsuccessful search requires about 1 +  links to be traversed.

5.3 Separate Chaining • A general rule is to make the table size as large as the expected number of elements. • Chaining could be through a list or a tree. • A disadvantage of separate chaining is that it requires a second data structure for the chains. Time is required for the allocation of new cells on insertion.

5.4 Open Addressing • If collision occurs, alternative cells are tried until an empty cell is found. • hi(X) = (Hash (X) + F(i)) mod TableSize, with F(0) = 0 • Load factor should be below 0.5. • Try consecutive locations (with wraparound), i.e., F(i) = i.

5.4.1 Linear Probing • Example: Key sequence 89, 18, 49, 58, 69

5.4.1 Linear Probing • Primary clustering Any key that hashes into the cluster will require several attempts to resolve the collision,and then it will add to the cluster. • Expected number of probes for successful search is S = 1/2(1+1/(1-))

5.4.1 Linear Probing • Primary clustering Any key that hashes into the cluster will require several attempts to resolve the collision,and then it will add to the cluster.

5.4.1 Linear Probing • Expected number of probes for successful search is S = 1/2(1+1/(1-)) • Expected number of probes for insertion and unsuccessful search and is

5.4.1 Linear Probing • For random collision resolution strategy (each probe is independent of the previous probes),

5.4.1 Linear Probing

5.4.2 Quadratic Probing • Eliminates the primary clustering problem • The collision function is quadratic, e.g., F(i) = i2 • No guarantee that all cells are tried. • No guarantee of finding an empty cell once the table gets more than half full, or even before the table gets full if the table size is not prime.

5.4.2 Quadratic Probing

5.4.2 Quadratic Probing • Eliminates the primary clustering problem • The collision function is quadratic, e.g., F(i) = i2 • No guarantee that all cells are tried. • No guarantee of finding an empty cell once the table gets more than half full, or even before the table gets half full if the table size is not prime.

5.4.2 Quadratic Probing Type declaration for open addressing typedef int ElementType; /* Fig. 5.14 */ #ifndef _HashQuad_H typedef unsigned int Index; typedef Index Position;

5.4.2 Quadratic Probing /* Place in the implementation file */ enum KindOf Entry {Legitimate, Empty, Deleted} struct HashEntry { ElementType Element; enum KindOfEntry Info; };

5.4.2 Quadratic Probing typedef struct HashEntry Cell; /* Cell *TheCells will be allocated later */ struct HashTbl { int TableSize; Cell *TheCells; };

5.4.2 Quadratic Probing struct HashTbl; typedef struct HashTbl *HashTable; HashTable InitializeTable (int TableSize); void DestroyTable (HashTable H); Position Find (ElementType Key, HashTable H); void Insert (ElementType Key, HashTable H);

5.4.2 Quadratic Probing ElementType Retrieve (Position P, HashTable H); HashTable Rehash (HashTable H); /* Delete & MakeEmpty are omitted */ #endif /* _HashQuad_H */

5.4.2 Quadratic Probing Routine to initialize open addressing hash table /* Fig. 5.15 */ HashTable InitializeTable (int TableSize) { HashTable H; int i;

5.4.2 Quadratic Probing if (TableSize < MinTableSize) { Error ("Table size too small"); return NULL; } /* Allocate table */ H = malloc (sizeof (struct HashTbl));

5.4.2 Quadratic Probing if (H == NULL) FatalError ("Out of space!!!"); H->TableSize = NextPrime (TableSize); /* Allocate array of Cells */ H->TheCells = malloc (sizeof (Cell) * H ->TableSize);

5.4.2 Quadratic Probing if (H->TheCells == NULL) FatalError ("Out of space!!!"); for (i = 0; i < H->TableSize; i++ ) H->TheCells [i].Info = Empty; return H; }

5.4.2 Quadratic Probing Routine for hashing with quadratic probing /* Fig. 5.16 */ Position Find (ElementType Key, HashTable H) { Position CurrentPos; int CollisionNum;

5.4.2 Quadratic Probing CollisionNum = 0; CurrentPos = Hash (Key, H->TableSize); while (H->TheCells [CurrentPos].Info != Empty && H-> TheCells [CurrentPos].Element != Key) /* Probably need strcmp!! */ { CurrentPos += 2 * ++CollisionNum - 1;

5.4.2 Quadratic Probing if (CurrentPos >= H->TableSize) CurrentPos -= H->TableSize; } return CurrentPos; } • If the table size is prime, a new element can always be inserted if the table is at least half empty.

Chapter 5 Hashing

Chapter 5 Hashing

Presentation Transcript

Hashing

Hashing

Chapter 5: Hashing

Chapter 5: Hashing

Chapter 8 Hashing

Hashing

Chapter 12: Indexing and Hashing

Hashing

Chapter 5: Hashing

Hashing

Chapter 48 Hashing

CHAPTER 7 HASHING

Chapter 8 Hashing

Chapter 12: Indexing and Hashing

CHAPTER 14: Hashing

Hashing, Hashing Tables

Chapter 11. Hashing

Chapter 28 Hashing

Chapter 48 Hashing