Overflow Handling

Overflow Handling • An overflow occurs when the home bucket for a new pair (key, element) is full. • We may handle overflows by: • Search the hash table in some systematic fashion for a bucket that is not full. • Linear probing (linear open addressing). • Quadratic probing. • Random probing. • Eliminate overflows by permitting each bucket to keep a list of all pairs for which it is the home bucket. • Array linear list. • Chain.

0 4 8 12 16 Linear Probing – Get And Insert • divisor = b (number of buckets) = 17. • Home bucket = key % 17. 34 0 45 6 23 7 28 12 29 11 30 33 • Insert pairs whose keys are 6, 12, 34, 29, 28, 11, 23, 7, 0, 33, 30, 45

34 0 45 6 23 7 28 12 29 11 30 33 0 4 8 12 16 34 45 6 23 7 28 12 29 11 30 33 0 4 8 12 16 0 4 8 12 16 34 45 6 23 7 28 12 29 11 30 33 Linear Probing – Delete • Delete(0) • Search cluster for pair (if any) to fill vacated bucket.

34 0 0 45 4 6 23 8 7 28 12 12 29 11 30 16 33 0 45 6 23 7 28 12 29 11 30 33 0 4 8 12 16 0 45 6 23 7 28 12 29 11 30 33 0 4 8 12 16 0 4 8 12 16 0 45 6 23 7 28 12 29 11 30 33 Linear Probing – Delete(34) • Search cluster for pair (if any) to fill vacated bucket.

34 0 0 45 4 6 23 8 7 28 12 12 29 11 30 16 33 34 0 45 6 23 7 28 12 11 30 33 0 4 8 12 16 34 0 45 6 23 7 28 12 11 30 33 0 4 8 12 16 0 4 8 12 16 34 0 45 6 23 7 28 12 11 30 33 0 4 8 12 16 34 0 6 23 7 28 12 11 30 45 33 Linear Probing – Delete(29) • Search cluster for pair (if any) to fill vacated bucket.

34 0 45 6 23 7 28 12 29 11 30 33 0 4 8 12 16 Performance Of Linear Probing • Worst-case find/insert/erase time is Q(n), where n is the number of pairs in the table. • This happens when all pairs are in the same cluster.

34 0 45 6 23 7 28 12 29 11 30 33 0 4 8 12 16 Expected Performance • a = loading density = (number of pairs)/b. • a = 12/17. • Sn = expected number of buckets examined in a successful search when n is large • Un = expected number of buckets examined in a unsuccessful search when n is large • Time to put and remove governed by Un.

Expected Performance • Sn ~ ½(1 + 1/(1 – a)) • Un ~ ½(1 + 1/(1 – a)2) • Note that 0 <= a <= 1. a <= 0.75 is recommended.

Hash Table Design • Performance requirements are given, determine maximum permissible loading density. • We want a successful search to make no more than 10 compares (expected). • Sn ~ ½(1 + 1/(1 – a)) • a <= 18/19 • We want an unsuccessful search to make no more than 13 compares (expected). • Un ~ ½(1 + 1/(1 – a)2) • a <= 4/5 • So a <= min{18/19, 4/5} = 4/5.

Hash Table Design • Dynamic resizing of table. • Whenever loading density exceeds threshold (4/5 in our example), rehash into a table of approximately twice the current size. • Fixed table size. • Know maximum number of pairs. • No more than 1000 pairs. • Loading density <= 4/5 => b >= 5/4*1000 = 1250. • Pick b (equal to divisor) to be a prime number or an odd number with no prime divisors smaller than 20.

Linear List Of Synonyms • Each bucket keeps a linear list of all pairs for which it is the home bucket. • The linear list may or may not be sorted by key. • The linear list may be an array linear list or a chain.

[0] 0 34 [4] 6 23 7 [8] 11 28 45 [12] 12 29 30 33 [16] Sorted Chains • Put in pairs whose keys are 6, 12, 34, 29, 28, 11, 23, 7, 0, 33, 30, 45 • Home bucket = key % 17.

Expected Performance • Note that a >= 0. • Expected chain length is a. • Sn ~ 1 + a/2. • Un ~ a

Overflow Handling