1 / 32

Memento : Coordinated In-Memory Caching for Data-Intensive Clusters

Memento : Coordinated In-Memory Caching for Data-Intensive Clusters. Ganesh Ananthanarayanan, Ali Ghodsi, Andrew Wang, Dhruba Borthakur, Srikanth Kandula, Scott Shenker, Ion Stoica. Data Intensive Computation. Data analytic clusters are pervasive

lily
Download Presentation

Memento : Coordinated In-Memory Caching for Data-Intensive Clusters

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Memento: Coordinated In-Memory Caching for Data-Intensive Clusters Ganesh Ananthanarayanan, Ali Ghodsi, Andrew Wang, Dhruba Borthakur, Srikanth Kandula, Scott Shenker, Ion Stoica

  2. Data Intensive Computation • Data analytic clusters are pervasive • Jobs run multiple tasks in parallel • Jobs operate on petabytes of input • Distributed file systems (DFS) store data distributed and replicated • Data reads are either disk-local or remote across the network

  3. Access to disk slowMemory orders of magnitude faster How do we leverage memory storagefor datacenter jobs?

  4. Can we store all data in memory? • Machines have tens of gigabytes of memory • But, huge discrepancy between storage and memory capacities • Facebook cluster has ~200x more data on disk than memory Use Memory as Cache

  5. Will the data fit in cache? Heavy-tailed  96% of smallest jobscan fit in the memory cache 10% total input is >80% of all jobs

  6. Elephants and mice • Mix of a few “large” jobs and very many “small” jobs • Large jobs: • Batch operations • Production jobs • Small jobs: • Interactive queries (e.g., Hive, SCOPE) • Experimental analytics

  7. Challenge: Small Parallel Jobs • Job finishes when its last task finishes • Need to cache all-or-nothing

  8. In summary… • Only option for memory-locality is caching • 96% of jobscan have their data in memory, if we cache it right

  9. Outline • FATE: Cache Replacement • Memento: System Architecture • Evaluation

  10. We care about jobs finishing faster… • Job j that completed in tn time normally, takes tm time with memory caching • %Reductionj= • Average % Reduction in Completion Time

  11. Traditional Cache Replacement • Traditional cache replacement policies (e.g., LRU, LFU) optimize for hit-ratio • Belady’s MIN: Evict blocks that are to be accessed “farthest in future”

  12. Belady’s MIN Example Data Block E, F, B, D, C, A (time) F, B, D, C, A (time) B, D, C, A (time) … 50% cache hit

  13. MIN: How much do jobs benefit? Data Block • Memory-local tasks are 10x (or 90%) faster B, D, C, A (time) J1 A B J2 C D • 4 computation slots J1A J1B J2C J2D 0% 0% Reduction: Average(0 + 0)/2 = 0%

  14. “Whole-job” inputs J1 A B Data Block J2 D C E, F, B, D, C, A (time) F, B, D, C, A (time) B, D, C, A (time) … 50% cache hit

  15. MIN: How much do jobs benefit? Data Block B, D, C, A • Memory-local tasks are 10x (or 90%) faster (time) J1 A B J2 C D • 4 computation slots J1A J1B J2C J2D 90% 0% Reduction: Average(90 + 0)/2 = 45% Cache hit-ratio not the most suited (MIN): Average(0 + 0)/2 = 0%

  16. FATE Cache Replacement • Maximize “whole-job” inputs in cache • Need global coordination • Parallel tasks distributed over different machines • Property: • Small jobs get preference • Large jobs benefit with remaining cache space

  17. Waves in the job • Single Wave • (small jobs) • All-or-nothing Multiple Waves  (large jobs) Linear benefits

  18. Waves in the job Multiple Waves Single Wave

  19. Outline • FATE: Cache Replacement • Memento: System Architecture • Evaluation

  20. Global coordination of local caches Global cache view

  21. Memento: Salient Features Metadata communication External Service Localcache reads

  22. Outline • FATE: Cache Replacement • Memento: System Architecture • Evaluation

  23. Evaluation • HDFS in conjunction with Memento • Microsoft and Facebook traces replayed • Replay jobs with same inter-arrival time • Deployment on EC2 cluster of 100 machines • 20GB memory for Memento • Jobs binned by their size

  24. Job Distribution, by bins

  25. Jobs are 77% faster at average Small jobs see 85% reduction in completion time

  26. Cache hit-ratio matters less Average job faster by 77% with FATE (vs.) 49% with MIN

  27. Memento scales sufficiently • Coordinator handles 10,000 simultaneous client communications • Client can handle eight simultaneous local map tasks • Sufficient for current datacenter loads

  28. Ongoing / Future work >>

  29. Simpler Implementation [1] • Ride the OS cache • Estimate where block is cached • Change job manager to track block accesses • No FATE, use default (LRU?) • Initial results show 2.3x improvement in cache hit-rate

  30. Alternate Metrics [2] • We optimize for “average % reduction in completion time” of jobs • Average • Weighted to include job priorities? • Other metrics • Reduction of load on disk subsystem? • Utilization?

  31. Solid State Devices [3] • SSDs, a new layer in the storage hierarchy • Hierarchical Caching • Include SSDs between disk and memory • What’s the best cache replacement policy?

  32. Summary • Memory-caching can be surprisingly effective • …despite disk and memory capacity discrepancy • Memento: Coordinated cache management • FATE Replacement Policy (“whole-jobs”) • Encouraging results for datacenter workload

More Related