140 likes | 159 Views
Learn about the amortized analysis of rehashing in hash tables, including strategies like the Aggregate Method and Accounting Method. Understand the cost of inserts, rehashing, and when to rehash based on the table size. Explore potential functions and the math behind amortized costs.
E N D
What is rehashing • Hash table too full spend a lot of time looking in buckets • Solution: rehash • make hash table twice the size • for each item in original hash table, hash to location in bigger table
Assumptions for Thursday, Feb. 10, 2000 • Rehash whenever table is 50% full or more • Just a sequence of inserts • (can be generalized for other operations) • We never get a collision • (once dealing with collisions, we’re in average-case analysis territory) • Hash table starts as size 2
Observations • How expensive is an insert? • How expensive is a rehash? • When will we need to rehash?
Observations • How expensive is an insert? • O(1) - assuming no collisions. Say it’s 1. • How expensive is a rehash? • O(N) - where N is current size of table. Say it’s N. • When will we need to rehash? • whenever size of table is power of 2 (1,2,4,8,16,…)
Amortized analysisStrategy 1: add up operations Note: I’ve made assumptions about the constants, but the analysis could be done for any constants. Note 2: This is sometimes called the “Aggregate Method”
Amortized analysisStrategy 2: Accounting Method • Charge the cost of some operations to other operations • Each operation gives us some “tokens”, which we can spend on future operations
Accounting Method analysis • Each insert gives us 3 tokens • 1 token for that insert • 1 token for rehashing this item the first time • 1 token for rehashing another item that already got rehashed once or more(since we rehash on double the size, # of items never hashed = # of items already hashed once or more)
Potential function • Potential function: more sophisticated tokens • Potential function is a function • When operation costs less, potential goes up • When operation costs more, take from potential • Always positive – or we’re taking too long
Tokens as a potential function • potential function at operation #i P(i)= 1 + 2F - S/2 • F = # of filled (non-empty) hash table slots • S = total # of hash table slots • Actual cost of operation of operation #iC(i) • Amortized cost of operation #(i+1):CA(i+1)=C(i+1) + ( P(i+1) – P(i) )
The Math • Beginning: • F=0, S=2. P(i)=0 • For insert with no rehashing • C(i+1)=1 • P(i+1)-P(i)=2 (added non-empty slot) • For insert with rehashing of N items • C(i+1)=1+N • P(i+1)-P(i)=2-N (before, F=N, S=2N. After, S=4N)
Potential method analysis • P(i) is always positive. • Yes. We rehash when F 1/2S, at which point, 4F=S. Etc… • Amortized cost is constant • CA(i+1)=C(i+1)+P(i+1)-P(i)= 3, in both cases (previous slide)