1 / 20

MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing

MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing. Bin Fan, David G. Andersen, Michael Kaminsky. Presenter: Son Nguyen. Memcached internal. LRU caching using chaining Hashtable and doubly linked list. Goals. Reduce space overhead (bytes/key)

gates
Download Presentation

MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MemC3: Compact and Concurrent MemCache with DumberCaching and Smarter Hashing Bin Fan, David G. Andersen, Michael Kaminsky Presenter: Son Nguyen

  2. Memcached internal • LRU caching using chaining Hashtable and doubly linked list

  3. Goals • Reduce space overhead (bytes/key) • Improve throughput (queries/sec) • Target read-intensive workload with small objects • Result: 3X throughput, 30% more objects

  4. Doubly-linked-list’s problems • At least two pointers per item -> expensive • Both read and write change the list’s structure -> need locking between threads (no concurrency)

  5. Solution: CLOCK-based LRU • Approximate LRU • Multiple readers/single writer • Circular queue instead of linked list -> less space overhead

  6. CLOCK example Originally: Read(kd): Write(kf, vf): Write(kg, vg):

  7. Chaining Hashtable’s problems • Use linked list -> costly space overhead for pointers • Pointer dereference is slow (no advantage from CPU cache) • Read is not constant time (due to possibly long list)

  8. Solution: Cuckoo Hashing • Use 2 hashtables • Each bucket has exactly 4 slots (fits in CPU cache) • Each (key, value) object therefore can reside at one of the 8 possible slots

  9. Cuckoo Hashing HASH1(ka) (ka,va) HASH2(ka)

  10. Cuckoo Hashing • Read: always 8 lookups (constant, fast) • Write: write(ka, va) • Find an empty slot in 8 possible slots of ka • If all are full then randomly kick some (kb, vb) out • Now find an empty slot for (kb, vb) • Repeat 500 times or until an empty slot is found • If still not found then do table expansion

  11. Cuckoo Hashing a b Insert a: HASH1(ka) (ka,va) HASH2(ka)

  12. Cuckoo Hashing Insert b: HASH1(kb) (kb,vb) c b HASH2(kb)

  13. Cuckoo Hashing Insert c: HASH1(kc) c (kc,vc) HASH2(kc) Done !!!

  14. Cuckoo Hashing • Problem: after (kb, vb) is kicked out, a reader might attempt to read (kb, vb) and get a false cache miss • Solution: Compute the kick out path (Cuckoo path) first, then move items backward • Before: (b,c,Null)->(a,c,Null)->(a,b,Null)->(a,b,c) • Fixed: (b,c,Null)->(b,c,c)->(b,b,c)->(a,b,c)

  15. Cuckoo path Insert a: HASH1(ka) (ka,va) HASH2(ka)

  16. Cuckoo path backward insert b a Insert a: HASH1(ka) (ka,va) HASH2(ka) c

  17. Cuckoo’s advantages • Concurrency: multiple readers/single writer • Read optimized (entries fit in CPU cache) • Still O(1) amortized time for write • 30% less space overhead • 95% table occupancy

  18. Evaluation 68% throughput improvement in all hit case. 235% for all miss

  19. Evaluation 3x throughput on “real” workload

  20. Discussion • Write is slower than chaining Hashtable • Chaining Hashtable: 14.38 million keys/sec • Cuckoo: 7 million keys/sec • Idea: finding cuckoo path in parallel • Benchmark doesn’t show much improvement • Can we make it write-concurrent?

More Related