1 / 29

MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads

This study focuses on job ordering optimization for online MapReduce workloads under FIFO scheduler. Different performance metrics such as makespan and total completion time are considered. MROrder is a prototype system that improves the performance of MapReduce workloads significantly.

waltersk
Download Presentation

MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads Shanjiang Tang, Bu-Sung Lee, Bingsheng He School of Computer Engineering Nanyang Technological University 30th Aug 2013

  2. OutLine • Background & Motivations • MROrder • Evaluation • Conclusion

  3. MapReduce Computation Model Map-Phase Computation Reduce-Phase Computation Reduce Map Map Map Map IntermediateResult Output Result Reduce IntermediateResult Output Result Input Data Final Result Reduce IntermediateResult Output Result Reduce IntermediateResult Output Result

  4. Hadoop Execution Model • Hadoop is an open-source implementation of MapReduce Model. • The cluster computation resources are divided into map slots and reduce slots, which are configured by Hadoop administrator in advance. • A MapReduce job generally consists of map tasks and reduce tasks. • Map tasks have to be allocated with map slots, and reduce tasks have to be allocated with reduce slots.

  5. Hadoop Execution Model Map slots Reduce slots Map tasks can only run on map slots, reduce tasks can only run on reduce slots Map tasks start before reduce tasks

  6. Job Order VS Performance Implication: Different Job orders have a significant impact on performance results!!! time Map Phase : Reduce Phase : time Map Phase : Reduce Phase :

  7. Our Goals • Job ordering Optimization is a non-trivial approach to improve the performance of MapReduce workloads ( i.e., a batch of MapReduce jobs). • Our work focuses on job ordering optimization for online MapReduce workloads under FIFO scheduler, where jobs arriving over time. • Different performance metrics are considered, e.g., makespan, total completion time.

  8. OutLine • Background & Motivations • MROrder • Evaluation • Conclusion

  9. Architecture Overview of MROrder

  10. Policy Module • Determine when and how to perform job ordering optimization for MapReduce jobs. • We provide two alternative solutions for determine when to perform job ordering optimization: • PNJ-Dominated Solution. performs job ordering when the number of jobs in the queue reaches to a threshold , i.e., . • TP-Dominated Solution. invokes periodically after a time interval. Notes: PNJ -- policy for the number of job. TP – time-based policy.

  11. Policy Module • TP-Dominated solution: • TP-Dominated Solution with Fixed Time Interval (TP-FTI). perform job ordering periodically within fixed time interval • TP-Dominated Solution with Adaptive Time Interval (TP-ATI). perform job ordering dynamically with adaptive time interval, based on the estimated running time of workloads.

  12. TP-FTI

  13. TP-ATI

  14. Ordering Engine • Responsible for performing job ordering optimization. • Two types of job ordering approaches: • Simulation-based Ordering Approach (SIM). we develop a Hadoop simulator Hsim to look for optimal results. It is a brute-force method. • Algorithm-based Ordering Approach (ALG). we provide efficient heuristic job ordering algorithms for different performance metrics, e.g., makespan, total completion time.

  15. ALG for Makespan

  16. ALG for Total Completion Time

  17. OutLine • Background & Motivations • MROrder • Evaluation • Conclusion

  18. Experiment Setup • Enviroments • A Hadoop cluster consisting of 10 nodes, each with two Intel X5675 CPUs, 24GB memory and 56GB hard disks. • Workloads • Synthetic Facebook Workload. we generated it based on previously related work. Most of jobs are small-size, aiming to use it to evaluate the total completion time. • Tested Workload. Most of its jobs are large-size, we use it to evaluate the makespan.

  19. TP-FTI VS TP-ATI TP-ATIissmarterandworksbetterthanTP-FTI! Δt : the suitable threshold of time period for time-based policy. PITCT: performance improvement of total completion time.

  20. ALG VS SIM SIM performs better than ALG, but consumes more time especially when the number of jobs are large.

  21. Performance Improvement by MROrder (Simulation Result) Total Completion Time is sensitive to the small-size dominated jobs !

  22. Performance Improvement by MROrder (Real Experiment Result) Makespan is sensitive to the large-size dominated jobs !

  23. OutLine • Background & Motivations • MROrder • Evaluation • Conclusion

  24. Conclusion • Job ordering optimization is a non-trivial method to improve the efficiency of slots resource utilization and perform of MapReduce workloads. • MROrder is a prototype system for online MapReduce workloads, being flexible for various performance metrics. • Experimental results show that MROrder improves the performance of MapReduce workloads significantly. • The source code of MROrder is available at: http://sourceforge.net/projects/mrorder/

  25. Ongoing and Future Work • Integrating MROrder into Hadoop system. • Considering the performance improvement for other schedulers, e.g., Hadoop Fair Scheduler, Capacity Scheduler. • Exploring other alternative approaches to improve the cluster utilization and performance of MapReduce workloads.

  26. Acknowledgement • This work is supported by the ”User and Domain driven data analytics as a Service framework” project under the A*STAR Thematic Strategic Research Programme (SERC Grant No. 1021580034).

  27. Thank You ! Question?

  28. Accuracy Evaluation of HSim

  29. Impact of Inaccuracy in Estimated Map/Reduce Tasks Time

More Related