Efficient Amortized Analysis of Rehashing in Hash Tables

Amortized Analysis of Rehashing

What is rehashing • Hash table too full  spend a lot of time looking in buckets • Solution: rehash • make hash table twice the size • for each item in original hash table, hash to location in bigger table

Assumptions for Thursday, Feb. 10, 2000 • Rehash whenever table is 50% full or more • Just a sequence of inserts • (can be generalized for other operations) • We never get a collision • (once dealing with collisions, we’re in average-case analysis territory) • Hash table starts as size 2

Observations • How expensive is an insert? • How expensive is a rehash? • When will we need to rehash?

Observations • How expensive is an insert? • O(1) - assuming no collisions. Say it’s 1. • How expensive is a rehash? • O(N) - where N is current size of table. Say it’s N. • When will we need to rehash? • whenever size of table is power of 2 (1,2,4,8,16,…)

Amortized analysisStrategy 1: add up operations Note: I’ve made assumptions about the constants, but the analysis could be done for any constants. Note 2: This is sometimes called the “Aggregate Method”

Amortized analysisStrategy 2: Accounting Method • Charge the cost of some operations to other operations • Each operation gives us some “tokens”, which we can spend on future operations

Accounting Method analysis • Each insert gives us 3 tokens • 1 token for that insert • 1 token for rehashing this item the first time • 1 token for rehashing another item that already got rehashed once or more(since we rehash on double the size, # of items never hashed = # of items already hashed once or more)

Potential function • Potential function: more sophisticated tokens • Potential function is a function • When operation costs less, potential goes up • When operation costs more, take from potential • Always positive – or we’re taking too long

Tokens as a potential function • potential function at operation #i P(i)= 1 + 2F - S/2 • F = # of filled (non-empty) hash table slots • S = total # of hash table slots • Actual cost of operation of operation #iC(i) • Amortized cost of operation #(i+1):CA(i+1)=C(i+1) + ( P(i+1) – P(i) )

The Math • Beginning: • F=0, S=2. P(i)=0 • For insert with no rehashing • C(i+1)=1 • P(i+1)-P(i)=2 (added non-empty slot) • For insert with rehashing of N items • C(i+1)=1+N • P(i+1)-P(i)=2-N (before, F=N, S=2N. After, S=4N)

Potential method analysis • P(i) is always positive. • Yes. We rehash when F  1/2S, at which point, 4F=S. Etc… • Amortized cost is constant • CA(i+1)=C(i+1)+P(i+1)-P(i)= 3, in both cases (previous slide)

Efficient Amortized Analysis of Rehashing in Hash Tables

Efficient Amortized Analysis of Rehashing in Hash Tables

Presentation Transcript

Amortized Algorithm Analysis

Amortized Analysis

Amortized Analysis

Amortized Analysis

Amortized Analysis

AmorTIZED ANALYSIS

Amortized Analysis

Amortized Analysis

Amortized Analysis

Amortized Analysis

Amortized Analysis

Amortized Analysis

Amortized Analysis

Amortized Analysis of Algorithm s

Amortized Analysis

Amortized Analysis

Amortized Analysis

Amortized Analysis

Amortized Analysis

Amortized Analysis