510 likes | 781 Views
Cuckoo Hashing and CAMs. Michael Mitzenmacher. Background. For the past several years, I have had funding from Cisco to research hash tables and related data structures for approximate measuring/monitoring on routers. Extreme conditions: Limited space. Limited # of memory accesses.
E N D
Cuckoo Hashing and CAMs Michael Mitzenmacher
Background • For the past several years, I have had funding from Cisco to research hash tables and related data structures for approximate measuring/monitoring on routers. • Extreme conditions: • Limited space. • Limited # of memory accesses. • Amenable to hardware implementation. • Hardware setting allows CAMs. • Question: what are the extreme conditions for hashing applications at Google?
Theme of The Talk How can we use CAMs (content addressable memories) to improve and make more practical cuckoo hashing, a potentially breakthrough hashing approach.
CAMs • CAM = content addressable memory • Fully associative lookup. • Usually expensive, so must be kept small. • Not usually considered in theoretical work, but very useful in practice. • Can we bridge this gap? • What can CAMs do for us?
Cuckoo Hashing [Pagh,Rodler] • Basic scheme: each element gets two possible locations. • To insert x, check both locations for x. If one is empty, insert. • If both are full, x kicks out an old element y. Then y moves to its other location. • If that location is full, y kicks out z, and so on, until an empty slot is found.
Cuckoo Hashing Examples A B C E D
Cuckoo Hashing Examples A B C F E D
Cuckoo Hashing Examples A B C F E D
Cuckoo Hashing Examples A B C F G E D
Cuckoo Hashing Examples E G B C F A D
Cuckoo Hashing Examples A B C G E D F
Good Properties of Cuckoo Hashing • Worst case constant lookup time. • Simple to build, design.
Cuckoo Hashing Failures • Bad case 1: inserted element runs into cycles. • Bad case 2: inserted element has very long path before insertion completes. • Could be on a long cycle. • Bad cases occur with very small probability when load is sufficiently low. • Theoretical solution: re-hash everything if a failure occurs.
Basic Performance • For 2 choices, load less than 50%, n elements gives failure rate of Q(1/n); maximum insert time O(log n). • Generalizations for more than 2 choices possible. • Place if possible; if not, place by kicking out a random choice, and so on. • Random walk multi-choice variant not fully analyzed; lots of open questions. • Good empirical performance. • An “impractical” BFS variant has failure rate Q(1/nd-1) for d choices.
Problems to be Considered • Reduce the failure probability. • Re-hashing generally not an option in router setting, and very expensive in other settings. • Reduce number of moves per insert. • Insert times may need to be bounded by constant in router setting. • CAMs provide help for both problems.
Failure Probability Reduction • Failure occurs when an element cannot be placed in one of its choices within a certain number (O(log n)) moves. • Standard cuckoo hashing: failure rate is too high for many applications. • Even with multiple choices per element. • Re-hashing an expensive option, although theoretically appealing.
A CAM-Stash • Use a CAM to stash away elements that would cause failure. • Intuition: if failures were independent, probability that s elements cause failures goes to Q(1/ns). • Failures not independent, but nearly so. • A stash holding a constant number of elements greatly reduces failure probability. • Implemented as a CAM in hardware, or a cache line in hardware/software. • Lookup requires also looking at stash.
Analysis Method • Treat cells as vertices, elements as edges in bipartite graph. • Count components that have excess edges to be placed in stash. • Random graph analysis to bound excess edges. 6 vertices, 7 edges: 1 edge must go into stash.
A Simple Experiment • 10,000 items, table of size 24,000, 2 choices per element, 107 trials.
Generalizations • Can similarly generalize known results for cuckoo hashing with more than 2 choices, more than 1 element per bucket. • Stash of size s reduces failure exponent linearly in s. • Intuition: random graph analysis exposes “bottleneck” in cuckoo hashing. Stashes relieve the bottleneck.
Summary • A CAM-stash greatly improves potential utility of cuckoo hashing. • Drives failures down to ignorable levels. • Constant-sized, so cheap. • More details in ESA 2008 paper (Kirsch/Mitzenmacher/Wieder). • Applies to other uses of cuckoo hashing. • History-independent cuckoo hashing, Naor/Segev/Wieder.
Insertion Time Problems • Lots of moves per insert in worst case. • Average is constant. • But maximum is W(log n) with non-trivial (inverse-poly) probability. • Router hardware setting: may need bounded number of memory accesses per insert.
A CAM-Queue • Insertion is a sequence of suboperations. • Of form “Move x to position Hj(x).” • Use the CAM as a queue for pending suboperations. • Perform suboperations from queue as available. • Move attempt = 1 lookup/write. • A suboperation may cause another suboperation to go on the queue. • Lookup: check the hash table and the CAM-queue. • De-amortization • Use queue to turn worst-case performance into average-case performance.
Queue Policy • Can reorder suboperations and maintain correctness. • Key point: better to give priority to “new” insertions over moves. • New insertions have d choices; moves effectively have d – 1. • Intuition suggests older elements may be less likely to be successfully placed. • True in practice. • Full priority queue may be too complex. • Simple strategy: new elements placed at front, failed moves places at back.
Experimental Evaluation • Table of size 32768, 4 subtables. • Target utilization u. • Insert 32678u elements, then alternate insertions/deletions to get to steady state. • Allow ops queue operations (parallel memory operations) per insertion.
Queue Sizes • Need CAM sized to overflow with negligible probability. • Maximum queue size much bigger than average. • Currently no analysis. • Experiments suggest queues of size in small 100s possible, with 4+ suboperations per insert, in practice.
Summary • A CAM-queue can allow effective deamortization of cuckoo hashing. • Insertion time constant at expense of a CAM to hold pending suboperations. • Could other data structures use this deamortization technique? • More details in Allerton 2008 paper (Kirsch/Mitzenmacher).
Insertion Time Problems • Lots of moves per insert in worst case. • Average is constant. • But maximum is W(log n) with non-trivial (inverse-poly) probability. • Router hardware settings: may need bounded number of memory accesses per insert.
Alternative Approach : Power of One Move • Limit to just one additional move per insert. • One move likely to be possible in practice. • Simple for hardware. • Some analysis possible via differential equations. • Insertions only case can be analyzed; deletions approximated. • Easier to analyze than cuckoo hashing. • But with limited inserts, will need a CAM to hold a non-trivial number of elements that cannot be placed.
Multilevel Hash Table [BK90] • Use a multilevel hash table (MHT) • Can store n elements with d = log log n + O(1) levels in O(n) space with high probability • Example with d = 4 hash functions Level 1 2 x 3 Skew: more elements placed by early hash functions (double exponential decay) 4
A CAM-Stash Redux • In practice, want d to be a constant. • Constant number of levels implies constant probability of an overflow per element. • But probability is very small. • Need a stash to hold a constant fraction of the elements. • Aim for small constant fraction, e.g. expected 0.2% of the elements overflow.
Example Schemes • Standard: MHT with no moves. • Conservative : Place element if possible. If not, try to move earliest element that has not already replaced another element to make room. Otherwise spill over. • Second chance : Read all possible locations, and for each location with an element, check it it can be placed in the next subtable. Place new element as early as possible, moving up to 1 element left 1 level. • Second chance 2: Second chance with 2 elements/bucket.
Second Chance (SC) Scheme • Standard MHT fills from top down • elements cascade from table to table. • We try to slow cascade at every step. x Standard MHT Insertion
Second Chance (SC) Scheme • Standard MHT fills from top down • elements cascade from table to table. • We try to slow cascade at every step. x
Second Chance (SC) Scheme • Standard MHT fills from top down • elements cascade from table to table. • We try to slow cascade at every step. x
Implementing SC in Hardware • Read x’s d hash locations in parallel. x
Implementing SC in Hardware • Read x’s d hash locations in parallel. • Hash discovered elements in parallel. x
Implementing SC in Hardware • Read x’s d hash locations in parallel. • Hash discovered elements in parallel. • Insert x, performing a move if necessary. x
Stash Size Distribution • Number of elements at each level is approximately a sum of independent Poisson trials. • When mean is large, approximately normal. • When mean is small, approximately Poisson. • Use Poisson distribution to approximate stash size distribution, to roughly estimate needed stash size for a failure probability.
Summary • Even one move saves significant space. • But with deletions things are more complex, more space required. • Some schemes amenable to fluid limit, differential equation analysis. • CAM-stash has different asymptotics in this setting. • Linear size vs. constant-sized.
Conclusions • CAMs a very powerful tool for hash-based data structures. • Flexible uses: stash, queue. • Deal effectively with low probability events. • Generally not considered in theoretical analysis. • But should be! • Scaling: linear, logarithmic, constant size CAMs? • Can help give high-performance, space-efficient hash tables. • Cuckoo hashing: constant time lookups, good space utilization, low failure probability, simple and flexible.
Open Questions and Future Work • Analyze practical multiple choice cuckoo hashing variants for d > 2 choices. • Analysis of CAM-queue for cuckoo hashing. • Better methods of dealing with settings with frequent deletions. • Your question here…