170 likes | 251 Views
A Performance Study of Grid Workflow Engines. Corina Stratan Parallel and Distributed Systems Group Politehnica University of Bucharest Romania. Alexandru Iosup and Dick Epema PDS Group Delft University of Technology The Netherlands. IEEE/ACM Grid 2008, Tsukuba, JP.
E N D
A Performance Study ofGrid Workflow Engines Corina Stratan Parallel and Distributed Systems Group Politehnica University of Bucharest Romania Alexandru Iosup and Dick Epema PDS Group Delft University of Technology The Netherlands IEEE/ACM Grid 2008, Tsukuba, JP.
Why are Grid Workflows Interesting? • Grids promise reliable and easy-to-use computational infrastructure for e-Science • Full automation from experiment design to final result • Often, automation = workflows • Jobs comprising inter-related computing and data-transfer tasks
Why is the Performance of RealGrid Workflow Engines Interesting? • For our users • Is this system suitable for its users? • Are other systems better? • For focusing on the right research problems • What are the interesting problems? System configuration? Which workflow characteristics? Other problems… • For simulation studies • Unrealistic assumptions limit the applicability of results.How scalable are GWFEs? What overheads do they have?
Problem: How to Assess the Performance of Grid Workflow Engines? • What do we want to assess? • Is testing in real environments appropriate? • What performance metrics are important? • What workflows to use? Our goal is to develop and validate a methodology for assessing GWFEs.
Outline • Introduction • Methodology for Testing GWFEs • The Methodology in Practice • Conclusion and Future Work
2. Methodology for Testing GWFEsWhat to Assess? • Traditional: raw performance metrics • Runtime, wait time, etc. • In addition, for Grids (failure-prone, complex environments): • OverheadWhat is the cost of using a GWFE? • StabilityDoes the system behave consistently? • ScalabilityDoes the system support grid-size workloads? • ReliabilityWhat is the impact of dynamic resource availability?
2. Methodology for Testing GWFEsIs Testing in Real Environments Appropriate? • Our approach (novel)Testing complete grid middlewarestacks in real grid environments. • Alternatives • Simulation [Ahmad & Kwok, JPDC’99] • Math. Analysis • Testing GWFEs in isolation (think unit vs. integration testing)
2. Methodology for Testing GWFEsWhat Performance Metrics are Important? • Overheads components: Oi, Oa, Os, Ost, Of • Raw performance: Makespan (MS), Speed-Up vs. Single/Infinite Machine, … • Stability: internal (MS IQR/Med.), overall (MS Range/Median) • Scalability, Reliability [see article]. Workflow Tasks Grid Workflow Engine Grid Resource Manager
2. Methodology for Testing GWFEsWhat Workflows to Use? Number of graph nodes Graph traversal height • No accepted workload; no real system traces. • Sources: related simulation work, Standard Task Graph Set, our investigation of test workflows from 2 long-term grid traces [CG Symp.’08], our model of grid bags-of-tasks validated with 7 long-term grid traces [HPDC’08].
Outline • Introduction • Methodology for Testing GWFEs • The Methodology in Practice • Conclusion and Future Work
3. The Methodology in Practice (Selected Results)Experimental Setup • Testing complete grid middleware stacks • Generic GWFE: a baseline GWFE implementation • 15 PCs, 2xP4@3.2GHz, 2GB RAM, 1Gbps Ethernet • Tools: MonALISA, ServMark = DiPerF + GrenchMark.
3. The Methodology in Practice (Selected Results)Overhead: Impact of WL Size and Type • Setup: DAGMan, empty jobs, C-4 (left) / many (right). • Oi >> Ost = Of. Internal state update very important. • S-1, S-3: many often updates lower system throughput.
!!!!!!!!!!!!!!!!!!!!!!!! 3. The Methodology in Practice (Selected Results)Raw Perf.: Performance vs. Consumption Karajan performs better than DAGMan, but runs quickly out of resources. Karajan DAGMan
3. The Methodology in Practice (Selected Results)Stability: Internal and Overall Stability • Setup: DAGMan, 10 independent runs, C-4, 10 WFs. • System is: • Internally stable • Overall not stable • Need to react to system dynamics to favor under-served workflows.
Outline • Introduction • Methodology for Testing GWFEs • The Methodology in Practice • Conclusion and Future Work
Conclusion and Future Work • Methodology for testing Grid Workflow Engines • Goals • Metrics • Workflows • Testing grid middleware stacks, not GWFEs in isolation! • Analysis of two much used GWFEs vs. a baseline GWFE • Future work • Apply method to more middleware stacks, in more environments • Design domain-specific workloads and assess the performance impact of the inter-domain differences (do different domains raise different challenges?)
Thank you! Questions? Remarks? Observations? • Contact: A.Iosup@gmail.com [google “Iosup“] • Web site: http://www.pds.ewi.tudelft.nl PDS group articles & software • Have (workflow-based) grid traces? • Additional References Help building our community’sGrid Workloads Archive: http://gwa.ewi.tudelft.nl [HPDC’08] A. Iosup, O. Sonmez, S. Anoep, and D.H.J. Epema, The Performance of Bags-Of-Tasks in Large-Scale Distributed Computing Systems, In IEEE HPDC'08, 2008. [CG Symp.’08] S. Ostermann, R. Prodan, T. Fahringer, and A. Iosup, On the characteristics of grid workflows, In CoreGRID Symp. 2008.