260 likes | 278 Views
GangSim is a discrete simulator used for studying resource scheduling in large distributed Grid systems, handling complex workload characteristics and interactions between resources and users. Derived from Ganglia Monitoring Toolkit, it enables real-time simulations and interaction with various Resource Managers.
E N D
GangSim: A Simulator for Grid Scheduling Studies Catalin L. Dumitrescu The University of Chicago Ian Foster Argonne National Laboratory & The University of Chicago
Talk Outline / Part I • Part I: • Introduction • Our Approach: GangSim, a discrete simulator • Motivating Scenarios • Architecture • Evaluation Criteria • Part II: • Simulation and Validation Results • Conclusions and Questions GangSim: A Grid Simulator for Resource Scheduling Studies
Introduction • Large distributed Grid systems pose new challenges • Overwhelming resource characteristics • Complex workload characteristics • Complex interactions and resource allocations • Analytical modeling is either impractical or impossible GangSim: A Grid Simulator for Resource Scheduling Studies
Our Approach: GangSim • Derived from Ganglia Monitoring Toolkit • Real-time simulator • Focus on local – VO interactions • Mixing simulations with real testbeds • Provide simple means for result visualization • Interactions with various Resource Managers (RMs) GangSim: A Grid Simulator for Resource Scheduling Studies
GangSim Novelty • Simulates (and Handles): • Sites with RMs • VO and groups • Submission hosts • Model usage allocations (SLAs) at several levels • Capacity to combine simulated results with real results collected from a real Grid • Useful for simulations of future trends GangSim: A Grid Simulator for Resource Scheduling Studies
Environment Overview GangSim: A Grid Simulator for Resource Scheduling Studies
Environment Details • Simulations target environments with: • large number of resources • resource owners • VOs • A few examples are: • Grid3 • OSG • TeraGrid • DataGrid GangSim: A Grid Simulator for Resource Scheduling Studies
Initial Research Problems • “What site usage policies are appropriate in a Grid environment, and how do these policies impact achieved site and VO performance?” • “What usage policy may be applied at the VO level?” • What site selection policies are best suited for various Grid environments?” GangSim: A Grid Simulator for Resource Scheduling Studies
GangSim Details GangSim: A Grid Simulator for Resource Scheduling Studies
GangSim Concepts • Site: characterized by various metrics about CPU, disk space and network connectivity • VO: composed of a groups and users • External Schedulers, Local Schedulers, and Data Schedulers: scheduling decision points at various levels in the grid • Policy enforcement points (S-PEP and V-PEP): responsible to gather usage and allocation information and provide/control how many jobs should run GangSim: A Grid Simulator for Resource Scheduling Studies
GangSim Strategies • Various algorithms can be used for scheduling • Site usage policy: • Simple fair share • Extensible fair share • Commitment fair share • Others • ES task assignment strategies: • Last recently used (according to available allocations) • Least used (according to available allocations) • Round robin / random assignment (…) GangSim: A Grid Simulator for Resource Scheduling Studies
Implementation Details • Ganglia (and VO-Centric Ganglia) various components were replaced • New components: • Simulator modules: track client and provider states • Task assignment policies: various algorithm invoked during running • Metric aggregators: monitoring sub-components used for scheduling decisions • Grid components: internal data structures • Interfaces: a set of CGI scripts remotely accessible GangSim: A Grid Simulator for Resource Scheduling Studies
Interface Screenshot Example GangSim: A Grid Simulator for Resource Scheduling Studies
Talk Outline / Part II • Part I: • Introduction • Our Approach: GangSim, a discrete simulator • Motivating Scenarios • Architecture • Evaluation Criteria • Part II: • Simulation and Validation Results • Conclusions and Questions GangSim: A Grid Simulator for Resource Scheduling Studies
Achievable Results • Interested in three main aspects: • Task Assignment and Policies • Simulated Architecture Variations • Simulator Performance GangSim: A Grid Simulator for Resource Scheduling Studies
Task Assignment and Policies Round Robin Assignment Policy Least Used Site Assignment Policy Round Robin Assignment Policy Used Site Assignment Policy GangSim: A Grid Simulator for Resource Scheduling Studies
Analytical Results • Automated performance metric computation • Example: • ART = Σi=1..N RTi / N Table 2: Unsynchronized Workloads – ART Table 1: Synchronized Workloads – ART GangSim: A Grid Simulator for Resource Scheduling Studies
Simulated Architectures • Various architectures can be simulated • Required changes of a few parameters • New algorithms can be considered Analytical Approach in Site Selection Observational Approach in Selection GangSim: A Grid Simulator for Resource Scheduling Studies
Simulator Performance • Important to find simulator limits • 15 VO and 100 sites on a single GangSim instance is achievable 15 VOs and 100 sites (6 VOs drawn) GangSim: A Grid Simulator for Resource Scheduling Studies
Validation Results • Results Comparison GangSim vs. Grid3: • Site Level Comparisons • VO Level Comparisons • Quantitative Comparisons GangSim: A Grid Simulator for Resource Scheduling Studies
Site Level Comparisons • GangSim and Grid3 on a single site (FermiLab) • 4 identical workloads • The GangSim and FermiLab executions both completed in close to the same time, but show rather different execution behavior Per-VO, FermiLab (Grid3) Per-VO, FermiLab (GangSim) GangSim: A Grid Simulator for Resource Scheduling Studies
VO Level Comparisons • GangSim and Grid3 runs across 12 sites • Starting times: iVDGL-1 at 20 seconds, BTEV-1 and USATLAS-1 at 200, LIGO-1 at 700 sec, BTEV-2 at 800, iVDGL-2 at 1000, USATLAS-2 at 1500, and LIGO-2 at 1700. Per-VO, 12 sites (Grid3) -VO, 12 sites (GangSim) GangSim: A Grid Simulator for Resource Scheduling Studies
Quantitative Comparisons • aggregated resource utilization (ARU) • average response time (ART) • ART = Σi=1..N RTi / N. • average starvation factor (ASF) • ASF = Σ ( MIN (STi, RTi) ) / Σ (ETi) Table 3: Simulation (S) vs. Grid3 (G) Metrics GangSim: A Grid Simulator for Resource Scheduling Studies
Conclusions about GangSim • a Grid simulator for analysis of different scheduling policies in a multi-site and multi-VO environment • Designed for discrete simulation techniques and modeling of important system components • demonstrated by describing studies of different VO-level scheduling policies in the presence of different local site resource allocation policies GangSim: A Grid Simulator for Resource Scheduling Studies
Addressed Questions • “What site usage policies are appropriate in a Grid environment, and how do these policies impact achieved site and VO performance?” • “What usage policy may be applied at the VO level?” • What site selection policies are best suited for various Grid environments?” GangSim: A Grid Simulator for Resource Scheduling Studies
Thanks… Questions? GangSim: A Grid Simulator for Resource Scheduling Studies