1 / 20

DGSim : Comparing Grid Resource Management Architectures Through Trace-Based Simulation

This paper presents the DGSim framework, a simulation tool for comparing grid resource management architectures. It discusses the challenges in writing grid simulators and the features of DGSim, including workload generation, inter-operation architectures, resource dynamics, and validation examples.

ryanbrown
Download Presentation

DGSim : Comparing Grid Resource Management Architectures Through Trace-Based Simulation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DGSim: Comparing Grid Resource Management Architectures Through Trace-Based Simulation Alexandru Iosup, Ozan Sonmez, and Dick Epema PDS Group Delft University of Technology The Netherlands

  2. A Grid Research Toolbox • Hypothesis: (a) is better than (b). For scenario 1, … 1 3 DGSim 2

  3. A Grid Research Toolbox • Hypothesis: (a) is better than (b). For scenario 1, … 1 3 DGSim 2

  4. The Problem with Grid Simulations • Three decades of writing simulators in computer science→ writing the simulator is not the problem • The problem: getting from solution design to experimental results with an automated simulation tool • Experimental setup • Tool to generate realistic experimental setups • Experiment support for grid resource management • Tool to manage large numbers of related simulations • Performance • Not the simulation time (decades of optimizations there) • Tool proved to work with large simulations (number of resources, workload size, etc.)

  5. Outline • Problem Statement • The DGSim Framework • DGSim Validation • DGSim Examples • Future Work

  6. 2. The DGSim FrameworkName, Goal, and Challenges • DGSim = Delft Grid Simulator • Simulate various grid resource management architectures • Multi-cluster grids • Grids of grids (THE grid) • Challenges • Many types of architectures • Generating and replaying grid workloads • Management of the simulations • Many repetitions of a simulation for statistical relevance • Simulations with many parameters • Managing results (e.g., analysis tools) • Enabling collaborative experiments Two GRM architectures

  7. 2. The DGSim Framework Overview Discrete-EventSimulator

  8. 2. The DGSim Framework Model Details: Inter-Operation Architectures Independent Centralized Hybrid hierarchical/ decentralized Hierarchical Decentralized

  9. 2. The DGSim Framework Model Details: Resource Dynamics & Evolution • Resource dynamics • Short-term changes in resource availability status • Resource evolution • Long-term changes in number & … of resources A. Iosup, M. Jan, O. Sonmez, and D.H.J. Epema, On the Dynamic Resource Availability in Grids, IEEE/ACM Grid, 2007.

  10. 2. The DGSim Framework Workloads: Generation and Model(s) • Workload Generation • Generate synthetic workload with realistic characteristics • Iterative workload generation: incur specified load on a grid • Parallel jobs • Adapting the Lublin-Feitelson model to grids • Bags-of-Tasks: groups of independent single-processor tasks • Validated with seven long-term grid traces A. Iosup, D.H.J.Epema, T. Tannenbaum, M. Farrellee, M. Livny, Inter-Operating Grids through Delegated MatchMaking, ACM/IEEE SuperComputing, 2007. A. Iosup, O.O. Sonmez, S. Anoep, D.H.J.Epema, The Performance of Bags-of-Tasks in Large-Scale Distributed Computing Systems, IEEE HPDC, 2008.

  11. Outline • Problem Statement • The DGSim Framework • DGSim Validation • DGSim Examples • Future Work

  12. 3. DGSim ValidationFunctional Validation • Functional validation (simple scenario) • Workload = 100 jobs ct. size 10,000 arrive at t=0 • System: grid scheduler over one 10-resource clusterresource = 1 work unit/second, information delay = 0-3600s

  13. 3. DGSim ValidationReal vs. Simulated DAS-3 Multi-Cluster Grid • Simulator setup • Application: synthetic parallel, communication-intensive (all-gather) Measured: runtime for various configurations (co-allocation) • System: heterogeneous clusters, Koala co-allocating scheduler • Workload: 300 jobs, submitted over a period of 6 hours • All jobs submitted through central cluster gateways • Results • Scheduling algorithm leads to similar results in real and simulated environments → can use simulator for analyzing scheduling trends • Under-estimation of waiting time (failures lead to more contention)

  14. Outline • Problem Statement • The DGSim Framework • DGSim Validation • DGSim Examples • Future Work

  15. 4. DGSim ExamplesSample 1/3 • Investigate mechanisms for inter-operating grids • New mechanism: DMM • Trace-based performance evaluation through simulations • Real and model-based traces • Largest trace: 1.4M jobs • Simulate Grid’5000+DAS-2 • Explored a design space of over 1 million design points A. Iosup, D.H.J.Epema, T. Tannenbaum, M. Farrellee, M. Livny, Inter-Operating Grids through Delegated MatchMaking, ACM/IEEE SuperComputing, 2007.

  16. AvailabilityInformationDelay HMA Long period AMA Short period SA KA On-Time (0) Static Dynamic Resource availability 4. DGSim ExamplesSample 2/3 • What is the performance impact of the dynamic grid resource availability? • Four models for grid resource availability information • Trace-based performance evaluation through simulations • Real traces • Simulate Grid’5000 • KA = AMA > HMA >> SA Goodput decreases with intervention delay Avg. Norm. G’put. [cpuseconds/day/proc] A. Iosup, M. Jan, O. Sonmez, and D.H.J. Epema, On the Dynamic Resource Availability in Grids, IEEE/ACM Grid, 2007. SA KA AMA 60s AMA 1h HMA 1w HMA 1mo HMA Never Model

  17. Task Information K H U ECT, FPLT K ECT-P FPF Resource Information DFPLT,MQD H RR, WQR U STFR 4. DGSim ExamplesSample 3/3 • Analyze performance of bag-of-tasks scheduling algorithms • Information availability framework: Known, Unknown, Historical records • Trace-based performance evaluation through simulations • Real and model-based traces • Simulate Grid’5000+DAS • Evaluated 8 scheduling algorithms • Explored a design space of over 2 million design points A. Iosup, O.O. Sonmez, S. Anoep, D.H.J.Epema, The Performance of Bags-of-Tasks in Large-Scale Distributed Computing Systems, IEEE HPDC, 2008.

  18. Outline • Problem Statement • The DGSim Framework • DGSim Validation • DGSim Examples • Future Work

  19. Conclusion and Future Work • The DGSim framework • Tool to generate realistic experimental setups • Tool to manage large numbers of grouped simulations • Tool proved to work with large simulations • Validated underlying models and assumptions • Resource dynamics and evolution model • Workload model • Comparing grid resource management architectures • Proven in various settings • Future work • More scenarios • Library of ready-to-use scenarios

  20. Thank you! Questions? Remarks? Observations? • Contact: A.Iosup@gmail.com [google “Iosup“] • Web sites: • http://www.vl-e.nl : VL-e project • http://www.pds.ewi.tudelft.nl : PDS group articles & software

More Related