140 likes | 157 Views
Amortized Analysis of Rehashing. What is rehashing. Hash table too full spend a lot of time looking in buckets Solution: rehash make hash table twice the size for each item in original hash table, hash to location in bigger table. Assumptions for Thursday, Feb. 10, 2000.
E N D
What is rehashing • Hash table too full spend a lot of time looking in buckets • Solution: rehash • make hash table twice the size • for each item in original hash table, hash to location in bigger table
Assumptions for Thursday, Feb. 10, 2000 • Rehash whenever table is 50% full or more • Just a sequence of inserts • (can be generalized for other operations) • We never get a collision • (once dealing with collisions, we’re in average-case analysis territory) • Hash table starts as size 2
Observations • How expensive is an insert? • How expensive is a rehash? • When will we need to rehash?
Observations • How expensive is an insert? • O(1) - assuming no collisions. Say it’s 1. • How expensive is a rehash? • O(N) - where N is current size of table. Say it’s N. • When will we need to rehash? • whenever size of table is power of 2 (1,2,4,8,16,…)
Amortized analysisStrategy 1: add up operations Note: I’ve made assumptions about the constants, but the analysis could be done for any constants. Note 2: This is sometimes called the “Aggregate Method”
Amortized analysisStrategy 2: Accounting Method • Charge the cost of some operations to other operations • Each operation gives us some “tokens”, which we can spend on future operations
Accounting Method analysis • Each insert gives us 3 tokens • 1 token for that insert • 1 token for rehashing this item the first time • 1 token for rehashing another item that already got rehashed once or more(since we rehash on double the size, # of items never hashed = # of items already hashed once or more)
Potential function • Potential function: more sophisticated tokens • Potential function is a function • When operation costs less, potential goes up • When operation costs more, take from potential • Always positive – or we’re taking too long
Tokens as a potential function • potential function at operation #i P(i)= 1 + 2F - S/2 • F = # of filled (non-empty) hash table slots • S = total # of hash table slots • Actual cost of operation of operation #iC(i) • Amortized cost of operation #(i+1):CA(i+1)=C(i+1) + ( P(i+1) – P(i) )
The Math • Beginning: • F=0, S=2. P(i)=0 • For insert with no rehashing • C(i+1)=1 • P(i+1)-P(i)=2 (added non-empty slot) • For insert with rehashing of N items • C(i+1)=1+N • P(i+1)-P(i)=2-N (before, F=N, S=2N. After, S=4N)
Potential method analysis • P(i) is always positive. • Yes. We rehash when F 1/2S, at which point, 4F=S. Etc… • Amortized cost is constant • CA(i+1)=C(i+1)+P(i+1)-P(i)= 3, in both cases (previous slide)