1 / 22

Cache Storage For the Next Billion

Cache Storage For the Next Billion. Students: Anirudh Badam, Sunghwan Ihm Research Scientist: KyoungSoo Park Presenter: Vivek Pai Collaborator: Larry Peterson. The Next Billion. Developing regions are not all alike Many people have stable food, clean water, reasonable power

hestia
Download Presentation

Cache Storage For the Next Billion

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cache Storage For the Next Billion Students: Anirudh Badam, Sunghwan Ihm Research Scientist: KyoungSoo Park Presenter: Vivek Pai Collaborator: Larry Peterson

  2. The Next Billion • Developing regions are not all alike • Many people have stable food, clean water, reasonable power • Connectivity, however, is bad • Growing middle class with desire for education & technology • These people are the next billion Cache Storage for the Next Billion

  3. Bad Networking & Options • Africa often backhauled through Europe • Satellite latency not fun • Ghana: 2Mbps, $6000/month! • Emerging option: disk • 1TB disk now $200 • Even latency better than satellite Cache Storage for the Next Billion

  4. Enter the Tiny Laptops • Problem – memory in 256MB range Cache Storage for the Next Billion

  5. Making Storage Work • Populate disk with content • Preloaded HTTP cache • Preloaded WAN accelerator cache • Preloaded Web sites – Wikipedia, etc • Ship disk to schools • Update as needed • Pull update caches on-demand during peak • Push updates off peak, overnight Cache Storage for the Next Billion

  6. Deployment Scenarios • Special servers per school • 2 for redundancy • Average school size: 100 students • @ 100/laptop, $10K/school • Problems • 2 servers @ $5K doubles per-school cost • Servers don’t ride laptop commodity curves • Solution: no servers, just laptops Cache Storage for the Next Billion

  7. Goal: 1 TB Cache Store on a 256MB Laptop • Why caching? • Improves Web access • Improves WAN access • Problem • Large disks are really slow • Disk storage requires index • In-memory indices optimize disk access Cache Storage for the Next Billion

  8. Memory Index Sizing • Squid: popular HTTP cache • 72 bytes/object • Web objects average 8KB each • 1TB = 125M objects • 125M objects = 9GB RAM just for index • Commercial caches: better RAM usage • 32 bytes/object • 1TB disk = 4GB RAM Cache Storage for the Next Billion

  9. Revisiting Cache Indexing • Seek reduction important • Most objects small • Access largely random • High insert rate • Assume hit rate is 50% • Assume cachable rate is 50% • Insert rate = 25% of request rate • High delete rate • Caches largely full • If insert rate = 25%, delete rate = 25% • Deletion using LRU, etc Cache Storage for the Next Billion

  10. Restarting the Design • Eliminate in-memory index • Treat disk like memory • Optimize data structures for locality • Use location-sensitive algorithms • Measure performance • Now consider what to add • For each addition, measure performance Cache Storage for the Next Billion

  11. What This Yields • HashCache family • One basic storage engine • Pluggable algorithms & indexing • HashCache proxy • Web proxy using HashCache engine Cache Storage for the Next Billion

  12. Performance Comparison Cache Storage for the Next Billion

  13. Index Bits Per Object 240 576 Cache Storage for the Next Billion

  14. Index Bits Per Object 39 240 31 11 576 0 0 Cache Storage for the Next Billion

  15. HashCache Memory Cache Storage for the Next Billion

  16. Storage Limits w/2GB Index Cache Storage for the Next Billion

  17. Beyond Diminishing Returns • HTTP cachability has upper limit • Beyond that, items revalidated helps • Revalidation on demand, or background • Uncached content still cachable • Wide-area accelerators • Must still contact servers, though Cache Storage for the Next Billion

  18. Why WAN Acceleration? • Lots of slowly-changing data • Wikipedia • News sites • “Customized” sites • WAN acceleration middleboxes • Custom protocol between boxes • Standard protocols to rest of net • Less desirable than caches for Web Cache Storage for the Next Billion

  19. WAN Acceleration Dilemma • WAN accelerators use chunks • Transit stream broken into chunks • Small chunks = high compression • Also lots of small objects • Large chunks = high performance • But worse for compression • Memory & disk important Cache Storage for the Next Billion

  20. Merging WAN Acc & HashCache • Easily index huge # chunks • Small chunks OK • Large chunks better • Store chunks redundantly • Optimize for performance & compression • Communicate tradeoffs to cache layer Cache Storage for the Next Billion

  21. Deployments • Two cache instances deployed • Both in Africa • Shared machines, multiple services • Working with OLPC on deployment • Working on licensing • Hopefully resolved this year • Goal: all-in-one server for schools Cache Storage for the Next Billion

  22. Longer Term Goals • Effort started around server consolidation • Virtualization nice, except for memory • Many apps very page-fault sensitive • Extracting & sharing components desirable • More work in developing regions • Even within the US: poor, rural, etc • Customization for school-like workloads • More work on peak/off-peak behavior Cache Storage for the Next Billion

More Related