Memento : Coordinated In-Memory Caching for Data-Intensive Clusters

Memento: Coordinated In-Memory Caching for Data-Intensive Clusters Ganesh Ananthanarayanan, Ali Ghodsi, Andrew Wang, Dhruba Borthakur, Srikanth Kandula, Scott Shenker, Ion Stoica

Data Intensive Computation • Data analytic clusters are pervasive • Jobs run multiple tasks in parallel • Jobs operate on petabytes of input • Distributed file systems (DFS) store data distributed and replicated • Data reads are either disk-local or remote across the network

Access to disk slowMemory orders of magnitude faster How do we leverage memory storagefor datacenter jobs?

Can we store all data in memory? • Machines have tens of gigabytes of memory • But, huge discrepancy between storage and memory capacities • Facebook cluster has ~200x more data on disk than memory Use Memory as Cache

Will the data fit in cache? Heavy-tailed  96% of smallest jobscan fit in the memory cache 10% total input is >80% of all jobs

Elephants and mice • Mix of a few “large” jobs and very many “small” jobs • Large jobs: • Batch operations • Production jobs • Small jobs: • Interactive queries (e.g., Hive, SCOPE) • Experimental analytics

Challenge: Small Parallel Jobs • Job finishes when its last task finishes • Need to cache all-or-nothing

In summary… • Only option for memory-locality is caching • 96% of jobscan have their data in memory, if we cache it right

Outline • FATE: Cache Replacement • Memento: System Architecture • Evaluation

We care about jobs finishing faster… • Job j that completed in tn time normally, takes tm time with memory caching • %Reductionj= • Average % Reduction in Completion Time

Traditional Cache Replacement • Traditional cache replacement policies (e.g., LRU, LFU) optimize for hit-ratio • Belady’s MIN: Evict blocks that are to be accessed “farthest in future”

Belady’s MIN Example Data Block E, F, B, D, C, A (time) F, B, D, C, A (time) B, D, C, A (time) … 50% cache hit

MIN: How much do jobs benefit? Data Block • Memory-local tasks are 10x (or 90%) faster B, D, C, A (time) J1 A B J2 C D • 4 computation slots J1A J1B J2C J2D 0% 0% Reduction: Average(0 + 0)/2 = 0%

“Whole-job” inputs J1 A B Data Block J2 D C E, F, B, D, C, A (time) F, B, D, C, A (time) B, D, C, A (time) … 50% cache hit

MIN: How much do jobs benefit? Data Block B, D, C, A • Memory-local tasks are 10x (or 90%) faster (time) J1 A B J2 C D • 4 computation slots J1A J1B J2C J2D 90% 0% Reduction: Average(90 + 0)/2 = 45% Cache hit-ratio not the most suited (MIN): Average(0 + 0)/2 = 0%

FATE Cache Replacement • Maximize “whole-job” inputs in cache • Need global coordination • Parallel tasks distributed over different machines • Property: • Small jobs get preference • Large jobs benefit with remaining cache space

Waves in the job • Single Wave • (small jobs) • All-or-nothing Multiple Waves  (large jobs) Linear benefits

Waves in the job Multiple Waves Single Wave

Global coordination of local caches Global cache view

Memento: Salient Features Metadata communication External Service Localcache reads

Evaluation • HDFS in conjunction with Memento • Microsoft and Facebook traces replayed • Replay jobs with same inter-arrival time • Deployment on EC2 cluster of 100 machines • 20GB memory for Memento • Jobs binned by their size

Job Distribution, by bins

Jobs are 77% faster at average Small jobs see 85% reduction in completion time

Cache hit-ratio matters less Average job faster by 77% with FATE (vs.) 49% with MIN

Memento scales sufficiently • Coordinator handles 10,000 simultaneous client communications • Client can handle eight simultaneous local map tasks • Sufficient for current datacenter loads

Ongoing / Future work >>

Simpler Implementation [1] • Ride the OS cache • Estimate where block is cached • Change job manager to track block accesses • No FATE, use default (LRU?) • Initial results show 2.3x improvement in cache hit-rate

Alternate Metrics [2] • We optimize for “average % reduction in completion time” of jobs • Average • Weighted to include job priorities? • Other metrics • Reduction of load on disk subsystem? • Utilization?

Solid State Devices [3] • SSDs, a new layer in the storage hierarchy • Hierarchical Caching • Include SSDs between disk and memory • What’s the best cache replacement policy?

Summary • Memory-caching can be surprisingly effective • …despite disk and memory capacity discrepancy • Memento: Coordinated cache management • FATE Replacement Policy (“whole-jobs”) • Encouraging results for datacenter workload

Memento : Coordinated In-Memory Caching for Data-Intensive Clusters

Memento : Coordinated In-Memory Caching for Data-Intensive Clusters

Presentation Transcript

Anti-Caching in Main Memory Database Systems

Memory Hierarchy, Caching, Virtual Memory

Caching and Virtual Memory

Satisfying Data-Intensive Queries Using GPU Clusters

Memento

Data Caching in WSN

Data Intensive Scientific Compute Model for Multicore clusters

Caching Queues in Memory Buffers

DATA CACHING IN WSN

Memory and Caching

Memory and Caching

Memento

A Service for Data-Intensive Computations on Virtual Clusters

Balancing Performance and Power Consumption in Data-Intensive Computing Clusters

PA Man Coordinated Memory Caching for Parallel Jobs

Memento

Caching in a Memory Hierarchy

Huge-Memory Systems for Data-Intensive Science

Memento

Huge-Memory Systems for Data-Intensive Science