200 likes | 222 Views
This paper presents the DGSim framework, a simulation tool for comparing grid resource management architectures. It discusses the challenges in writing grid simulators and the features of DGSim, including workload generation, inter-operation architectures, resource dynamics, and validation examples.
E N D
DGSim: Comparing Grid Resource Management Architectures Through Trace-Based Simulation Alexandru Iosup, Ozan Sonmez, and Dick Epema PDS Group Delft University of Technology The Netherlands
A Grid Research Toolbox • Hypothesis: (a) is better than (b). For scenario 1, … 1 3 DGSim 2
A Grid Research Toolbox • Hypothesis: (a) is better than (b). For scenario 1, … 1 3 DGSim 2
The Problem with Grid Simulations • Three decades of writing simulators in computer science→ writing the simulator is not the problem • The problem: getting from solution design to experimental results with an automated simulation tool • Experimental setup • Tool to generate realistic experimental setups • Experiment support for grid resource management • Tool to manage large numbers of related simulations • Performance • Not the simulation time (decades of optimizations there) • Tool proved to work with large simulations (number of resources, workload size, etc.)
Outline • Problem Statement • The DGSim Framework • DGSim Validation • DGSim Examples • Future Work
2. The DGSim FrameworkName, Goal, and Challenges • DGSim = Delft Grid Simulator • Simulate various grid resource management architectures • Multi-cluster grids • Grids of grids (THE grid) • Challenges • Many types of architectures • Generating and replaying grid workloads • Management of the simulations • Many repetitions of a simulation for statistical relevance • Simulations with many parameters • Managing results (e.g., analysis tools) • Enabling collaborative experiments Two GRM architectures
2. The DGSim Framework Overview Discrete-EventSimulator
2. The DGSim Framework Model Details: Inter-Operation Architectures Independent Centralized Hybrid hierarchical/ decentralized Hierarchical Decentralized
2. The DGSim Framework Model Details: Resource Dynamics & Evolution • Resource dynamics • Short-term changes in resource availability status • Resource evolution • Long-term changes in number & … of resources A. Iosup, M. Jan, O. Sonmez, and D.H.J. Epema, On the Dynamic Resource Availability in Grids, IEEE/ACM Grid, 2007.
2. The DGSim Framework Workloads: Generation and Model(s) • Workload Generation • Generate synthetic workload with realistic characteristics • Iterative workload generation: incur specified load on a grid • Parallel jobs • Adapting the Lublin-Feitelson model to grids • Bags-of-Tasks: groups of independent single-processor tasks • Validated with seven long-term grid traces A. Iosup, D.H.J.Epema, T. Tannenbaum, M. Farrellee, M. Livny, Inter-Operating Grids through Delegated MatchMaking, ACM/IEEE SuperComputing, 2007. A. Iosup, O.O. Sonmez, S. Anoep, D.H.J.Epema, The Performance of Bags-of-Tasks in Large-Scale Distributed Computing Systems, IEEE HPDC, 2008.
Outline • Problem Statement • The DGSim Framework • DGSim Validation • DGSim Examples • Future Work
3. DGSim ValidationFunctional Validation • Functional validation (simple scenario) • Workload = 100 jobs ct. size 10,000 arrive at t=0 • System: grid scheduler over one 10-resource clusterresource = 1 work unit/second, information delay = 0-3600s
3. DGSim ValidationReal vs. Simulated DAS-3 Multi-Cluster Grid • Simulator setup • Application: synthetic parallel, communication-intensive (all-gather) Measured: runtime for various configurations (co-allocation) • System: heterogeneous clusters, Koala co-allocating scheduler • Workload: 300 jobs, submitted over a period of 6 hours • All jobs submitted through central cluster gateways • Results • Scheduling algorithm leads to similar results in real and simulated environments → can use simulator for analyzing scheduling trends • Under-estimation of waiting time (failures lead to more contention)
Outline • Problem Statement • The DGSim Framework • DGSim Validation • DGSim Examples • Future Work
4. DGSim ExamplesSample 1/3 • Investigate mechanisms for inter-operating grids • New mechanism: DMM • Trace-based performance evaluation through simulations • Real and model-based traces • Largest trace: 1.4M jobs • Simulate Grid’5000+DAS-2 • Explored a design space of over 1 million design points A. Iosup, D.H.J.Epema, T. Tannenbaum, M. Farrellee, M. Livny, Inter-Operating Grids through Delegated MatchMaking, ACM/IEEE SuperComputing, 2007.
AvailabilityInformationDelay HMA Long period AMA Short period SA KA On-Time (0) Static Dynamic Resource availability 4. DGSim ExamplesSample 2/3 • What is the performance impact of the dynamic grid resource availability? • Four models for grid resource availability information • Trace-based performance evaluation through simulations • Real traces • Simulate Grid’5000 • KA = AMA > HMA >> SA Goodput decreases with intervention delay Avg. Norm. G’put. [cpuseconds/day/proc] A. Iosup, M. Jan, O. Sonmez, and D.H.J. Epema, On the Dynamic Resource Availability in Grids, IEEE/ACM Grid, 2007. SA KA AMA 60s AMA 1h HMA 1w HMA 1mo HMA Never Model
Task Information K H U ECT, FPLT K ECT-P FPF Resource Information DFPLT,MQD H RR, WQR U STFR 4. DGSim ExamplesSample 3/3 • Analyze performance of bag-of-tasks scheduling algorithms • Information availability framework: Known, Unknown, Historical records • Trace-based performance evaluation through simulations • Real and model-based traces • Simulate Grid’5000+DAS • Evaluated 8 scheduling algorithms • Explored a design space of over 2 million design points A. Iosup, O.O. Sonmez, S. Anoep, D.H.J.Epema, The Performance of Bags-of-Tasks in Large-Scale Distributed Computing Systems, IEEE HPDC, 2008.
Outline • Problem Statement • The DGSim Framework • DGSim Validation • DGSim Examples • Future Work
Conclusion and Future Work • The DGSim framework • Tool to generate realistic experimental setups • Tool to manage large numbers of grouped simulations • Tool proved to work with large simulations • Validated underlying models and assumptions • Resource dynamics and evolution model • Workload model • Comparing grid resource management architectures • Proven in various settings • Future work • More scenarios • Library of ready-to-use scenarios
Thank you! Questions? Remarks? Observations? • Contact: A.Iosup@gmail.com [google “Iosup“] • Web sites: • http://www.vl-e.nl : VL-e project • http://www.pds.ewi.tudelft.nl : PDS group articles & software