1 / 29

Reining in the Outliers in MapReduce Jobs using Mantri

Reining in the Outliers in MapReduce Jobs using Mantri. Ganesh Ananthanarayanan † , Srikanth Kandula*, Albert Greenberg*, Ion Stoica † , Yi Lu*, Bikas Saha*, Ed Harris* † UC Berkeley * Microsoft. MapReduce Jobs. Basis of analytics in modern Internet services

aric
Download Presentation

Reining in the Outliers in MapReduce Jobs using Mantri

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reining in the Outliers in MapReduce Jobs using Mantri Ganesh Ananthanarayanan†, Srikanth Kandula*, Albert Greenberg*, Ion Stoica†, Yi Lu*, Bikas Saha*, Ed Harris* † UC Berkeley * Microsoft

  2. MapReduce Jobs • Basis of analytics in modern Internet services • E.g., Dryad, Hadoop • Job  {Phase}  {Task} • Graph flow consists of pipelines as well as strict blocks

  3. Phase Example Dryad Job Graph Pipeline Blocked until input is done Distr. File System Distr. File System EXTRACT EXTRACT Map.1 Map.2 AGGREGATE_PARTITION AGGREGATE_PARTITION Reduce.1 Reduce.2 FULL_AGGREGATE FULL_AGGREGATE PROCESS Join COMBINE PROCESS Distr. File System

  4. Log Analysis from Production • Logs from production cluster with thousands of machines, sampled over six months • 10,000+ jobs, 80PB of data, 4PB network transfers • Task-level details • Production and experimental jobs

  5. Outliers hurt! • Tasks that run longer than the rest in the phase • Median phase has 10% outliers, running for >10x longer • Slow down jobs by 35% at median • Operational Inefficiency • Unpredictability in completion times affect SLAs • Hurts development productivity • Wastes compute-cycles

  6. Why do outliers occur? Read Input Execute Input Unavailable Network Congestion Local Contention Workload Imbalance Mantri: A system that mitigates outliers based on root-cause analysis

  7. Mantri’s Outlier Mitigation • Avoid Recomputation • Network-aware Task Placement • Duplicate Outliers • Cognizant of Workload Imbalance

  8. Recomputes: Illustration (a) Barrier phases (b) Cascading Recomputes Actual Actual Inflation Inflation Ideal Ideal Recompute task Normal task

  9. What causes recomputes? [1] • Faulty machines • Bad disks, non-persistent hardware quirks Set of faulty machines varies with time, not constant (4%)

  10. What causes recomputes? [2] • Transient machine load • Recomputes correlate with machine load • Requests for data access dropped

  11. Replicatecostly outputs MR: Recompute Probability of a machine Task1 Task 2 Task 3 MR3 MR2 TRecomp = ((MR3*(1-MR2)) * T3 Recompute only Task3 or both Task3 as well as Task2 + Replicate (TRep) (MR3 * MR2) (T3+T2) TRep < TRecomp REPLICATE

  12. Transient Failure Causes • Recomputes manifest in clutches • Machine prone to cause recomputes till the problem is fixed • Load abates, critical process restart etc. • Clue:At least r recomputes within t time window on a machine

  13. Speculative Recomputes • Anticipatorily recompute tasks whose outputs are unread Task Input Data (Read Fail) Speculative Recompute Speculative Recompute Unread Data

  14. Mantri’s Outlier Mitigation • Avoid Recomputation • Preferential Replication + Speculative Recomp. • Network-aware Task Placement • Duplicate Outliers • Cognizant of Workload Imbalance

  15. Reduce Tasks • Tasks access output of tasks from previous phases • Reduce phase (74% of total traffic) Distr. File System Local Map Network Reduce Outlier!

  16. Variable Congestion Reduce task Rack Map output Smart placement smoothens hotspots

  17. Traffic-based Allotment Goal: Minimize phase completion time For every rack: • d : data • u : available uplink bandwidth • v : available downlink bandwidth • Solve for task allocation fractions, ai

  18. Local Control is a good approx. Goal: Minimize phase completion time • Let rack i have ai fraction of tasks • Time uploading, Tu = di (1 - ai) / ui • Time downloading, Td = (D – di) ai / vi • Timei = max {Tu, Td} For every rack: • d : data, D: data over all racks • u : available uplink bandwidth • v : available downlink bandwidth Link utilizations average out in long term, are steady on the short term

  19. Mantri’s Outlier Mitigation • Avoid Recomputation • Preferential Replication + Speculative Recomp. • Network-aware Task Placement • Traffic on link proportional to bandwidth • Duplicate Outliers • Cognizant of Workload Imbalance

  20. Contentions cause outliers • Tasks contend for local resources • Processor, memory etc. • Duplicate tasks elsewhere in the cluster • Current schemes duplicate towards end of the phase (e.g., LATE [OSDI 2008]) • Duplicate outlier or schedule pending task?

  21. Resource-Aware Restart Save time and resources: P(ctnew < (c + 1) trem) trem Running task Potential restart (tnew) time now • Continuously observe and kill wasteful copies

  22. Mantri’s Outlier Mitigation • Avoid Recomputation • Preferential Replication + Speculative Recomp. • Network-aware Task Placement • Traffic on link proportional to bandwidth • Duplicate Outliers • Resource-Aware Restart • Cognizant of Workload Imbalance

  23. Workload Imbalance • A quarter of the outlier tasks have more data to process • Unequal key partitions for reduce tasks • Ignoring these better than duplication • Schedule tasks in descending order of data to process • Time α (Data to Process) • [Graham ‘69] At worse, 33% of optimal

  24. Mantri’s Outlier Mitigation • Avoid Recomputation • Preferential Replication + Speculative Recomp. • Network-aware Task Placement • Traffic on link proportional to bandwidth • Duplicate Outliers • Resource-Aware Restart • Cognizant of Workload Imbalance • Schedule in descending order of size • Predict to act early • Be resource-aware • Act based on the cause Reactive Proactive

  25. Results • Deployed in production Bing clusters • Trace-driven simulations • Mimic workflow, failures, data skew • Compare with existing and idealized schemes

  26. Jobs in the Wild • Act Early: Duplicates issued when task 42% done (77% for Dryad) • Light: Issues fewer copies (.47X as many as Dryad) • Accurate:2.8x higher success rate of copies Jobs faster by 32% at median, consuming lesser resources

  27. Recomputation Avoidance Eliminates most recomputes with minimal extra resources (Replication + Speculation) work well in tandem

  28. Network-Aware Placement Bandwidth approximations Mantri well-approximates the ideal

  29. Summary • From measurements in a production cluster, • Outliers are a significant problem • Are due to an interplay between storage, network and map-reduce • Mantri, a cause-, resource-aware mitigation • Deployment shows encouraging results • “Reining in the Outliers in MapReduce Clusters using Mantri”, USENIX OSDI 2010

More Related