A Look at Modern Dictionary Structures & Algorithms

A Look at Modern Dictionary Structures & Algorithms Warren Hunt

Dictionary Structures • Used for storing information • (key, value) pairs • Bread and Butter of a Data-structures and Algorithms course

Common Dictionary Structures • List (Array) • Sorted List • Linked List • Move to Front List • Inverted Index List • Skip List  check this one out • …

Common Dictionary Structures • (Balanced) Binary Search Trees • AVL Tree • Red-Black Tree • Splay Tree • B-Tree • Trie • Patricia Tree • …

Common Dictionary Structures • Hash Tables • Linear (or Quadratic) Probing • Separate Chaining (or Treeing) • Double Hashing • Perfect Hashing • Hash Trees • Cuckoo Hashing • d-ary • binned • …

+Every Hybrid You Can Think Of! • Unfortunately, they don’t teach the cool ones… • Skip lists are a faster/easier to code alternative to most binary search trees • Invented in 1990! • Cuckoo Hashing has a huge number of nice properties (IMHO far superior to all other hashing designs) • Invented in 2001

So many to choose from!Which is best? • That Depends on your needs… • Sorted Lists are simple and easy to implement (simple means fast on small datasets!) • Binary search trees and sorted lists provide easy access to sorted data • B-trees have great page-performance for databases • Hash tables have fastest asymptotic lookup time

Focus On Hashing for Now • Fastest lookup/insert/delete time: O(1) • Used in Bloom-filters • not the graphics kind! • Useful in garbage collection (or anywhere you want to mark things as visited) • Small hash-tables implement an associative cache • Easy to implement! (no pointer chasing)

Traditional Hashing • Just make up an address in an array for some piece of data and stick it there • Hash function generates the address • Problems arise when two things have the same address, so we’ll address that: • Linear (or Quadratic) Probing • Separate Chaining (Treeing…) • Double Hashing

Problems With Traditional Hashing • Without separate chaining, they can’t get too full or bad things happen • With separate chaining, we have poor cache performance and still O(n) worst case behavior • Separate treeing provides O(log n) worst case, but they don’t teach that in school… • Linear probing is still the most common (fastest cache behavior, bite the bullet on poorer memory utilization)

Good Hash Functions • All hash table implementations require good hash functions (with the exception of separate treeing) • Universal hash functions are required (number theory, I won’t discuss it here) • Cuckoo hashing is less strict (different assumptions are made in each paper to make proofs easier)

Cuckoo Hashing • Guaranteed O(1) lookup/delete • Amortized O(1) insert • 50% space efficient • Requires *mostly* random hash functions • Newish and largely unknown (barely mentioned in Wikipedia-Hash Tables)

Cuckoo Hashing • Use two hash tables and two hash functions • Each element will have exactly one “nest” (hash location) in each table • Guarantee that any element will only ever exist in one of its “nests” • Lookup/delete are O(1) because we can check 2 locations (“nests”) in O(1) time

Cuckoo Hashing - Insertion • Insert an element by finding one of its “nests” and putting it there • This may evict another element! (goto 2.) • Insert the evicted element into its *other* “nest” This may evict another element! (goto 2.) • Under reasonable assumptions, this process will terminate in O(1) time…

Why does this work? • Matching property of random graphs • With high probability, any matching under a saturation threshold (50% in this case) can take another edge without breaking • More details in the paper

Overflowing the Table • Insertion can potentially fail causing an infinite insertion loop • Detected using a depth cutoff • Due to unlucky hash functions • Due to a full hash table • Double the size of the table (if need be), choose new hash functions and rehash all of the elements

Example • To the board!

Asymetric Cuckoo Hashing • Choose one (the first) table to be larger than the other • Improves the probability that we get a hit on the first lookup • Only a minor slowdown on insert

Same Table Cuckoo Hashing • We didn’t actually need two separate tables. • It made the analysis much easier • But… In practice, we just need two hash functions

d-ary Cuckoo Hashing • Guaranteed O(1) lookup/delete • Amortized O(1) insert • 97%+ space efficient • Analysis requires random hash functions • (not quite as easy to implement) • (robust against crappier hash functions)

d-ary Cuckoo Hashing • Use d hash tables instead of two! • Lookup and delete look at d buckets • Insert is more complicated • Insertion sees a tree of possible eviction+insertion paths • BFS to find an empty nest • Random walk to find an empty nest (easier)

Bucketed Cuckoo Hashing • Guaranteed O(1) lookup/delete • Amortized O(1) insert • 90%+ space efficient • Requires *mostly* random hash functions • (easier to implement) • (better, “good” cache performance)

Bucketed Cuckoo Hashing • Use two hash functions: but each hashes to an associative m-wide bucket • Lookup and delete must check at most two whole buckets • Insertion into a full bucket leaves a choice during eviction • Insertion sees a tree of possible eviction+insertion paths • BFS to find an empty bucket • Best first uses most empty target bucket • Random walk to find an empty bucket (easier) • Use LRI eviction for easiest implementation

Generalization: Use both! • Use k hash function • Use bins for size m • Get the best of both worlds!

Max load for O(1) Insert – 99% Guarantee (proven)

IBM’s Implementation • IBM designed a hash table for the cell processor • Parameters: K=2, M=4 (SIMD width) • If hash table fit in scratch L2: • lookup in 21 cycles • Simple multiplicative hash functions worked well

Better Cache Performance than you Would Think • If prefetching is used, cost of lookup is one memory latency (plus time to compute the hash function, which can be done in SIMD) • Exactly two cache-line loads • Binary search trees, linear probing, linear chaining, etc… usually take more cache-line loads and have a very branchy search loop

Conclusions • Cuckoo Hashing Provides: • Guaranteed O(1) lookup+delete • Amortized O(1) insert • Efficient memory utalization • Both in space and bandwidth! • Small constant factors • And SIMD friendly! • And is simple to implement • (easier than linear probing!)

Good Hash Function? • http://www.burtleburtle.net/bob/c/lookup3.c • (very fast, especially if you use the __rotl intrinsic) • #define mix(a,b,c){ a -= c; a ^= rot(c, 4); c += b; b -= a; b ^= rot(a, 6); a += c; c -= b; c ^= rot(b, 8); b += a; a -= c; a ^= rot(c,16); c += b; b -= a; b ^= rot(a,19); a += c; c -= b; c ^= rot(b, 4); b += a;}

Questions?

A Look at Modern Dictionary Structures & Algorithms