220 likes | 361 Views
Resource and Test Management in Grids. Dick Epema, Catalin Dumitrescu, Hashim Mohamed, Alexandru Iosup , Ozan Sonmez. Parallel and Distributed Systems Group Delft University of Technology. Rapid Prototyping in e-Science VL-e Workshop, Amsterdam, NL. A Brief Introduction to Grid Computing.
E N D
Resource and Test Management in Grids Dick Epema, Catalin Dumitrescu, Hashim Mohamed, Alexandru Iosup, Ozan Sonmez Parallel and Distributed Systems GroupDelft University of Technology Rapid Prototyping in e-Science VL-e Workshop, Amsterdam, NL
A Brief Introduction to Grid Computing • Typical grid environmente.g., the DAS • Applications [!] • Resources • Compute (Clusters) • Storage • (Dedicated) Network • Virtual Organizations, Projects (e.g., VL-e), Groups, Users • Grids vs. (traditional)parallel production environments • Dynamic • Heterogeneous • Very large-scale (world) • No central administration →Most problems are NP-hard,need experimental validation
Outline • A Brief Introduction to Grid Computing • Koala: Processor and Data Co-Allocation in Grids • The Co-Allocation Problem in Grids • The Koala Design • Koala and the DAS Community • The Future of Koala • GrenchMark: Analyzing, Testing, and Comparing Grids • Grid Performance Evaluation Issues • The GrenchMark Architecture • Experience with GrenchMark • Take home message
The Co-allocation Problem in Grids (1)Motivation • Co-allocation = the simultaneous allocation of resources in multiple clusters to single applications which consist of multiple components • Reasons • Use more resources than available at single cluster at given time • Create a specific virtual environment (e.g., visualization cluster , geographically spread data) • Achieve reliability through replication on multiple clusters • Avoid resource contention on the same site (e.g., batches)
The Co-allocation Problem in Grids (2) Overall Example global queue KOALA local queues with local schedulers load sharing LS LS LS co-allocation clusters global job local jobs Source: Dick Epema
The Co-allocation Problem in Grids (3)Details: Processors and Data Co-Alloc. • Jobs have access to processors and data from many sites • Files stored at different file sites, replicas may exist • Scheduler decides on job component placement at execution sites • Jobs can be of high or low priority Source: Hashim Mohamed
The Co-allocation Problem in Grids (4)Details: Co-Allocated Job Types fixed jobs non-fixed jobs Job component size and placement fixed by user Job component size fixed by user, placement by scheduler decision semi-fixed jobs flexible jobs Job component size and placement by scheduler decision / fixed by user Job component size and placement by scheduler decision
The Koala Design Source: Hashim Mohamed SelectionPlacing job components ControlTransfer executable and input files InstantiationClaiming resources selected for each job component RunSubmit, then monitor job execution(fault-tolerance)
The Koala Selection StepMany Placement Policies • Originally supported co-allocation policies: • Worst-Fit: balance job components across sites • Close-to-Files: take into account the locations of input files to minimize transfer times • (Flexible) Cluster Minimization: mitigate inter-cluster communication; can also split the job automatically • But, different application types require different ways of component placement • So: • Modular structure with pluggable policies • Take into account internal structure of applications
The Koala Selection StepHOCs: Exploiting Application Structure • Higher-Order Components: • Pre-packaged software components with generic patterns of parallel behavior • Patterns: master-worker, pipelines, wavefront • Benefits: • Facilitates parallel programming in grids • Enables user-transparent scheduling in grids • Most important additional middleware: • Translation layer that builds a performance model from the HOC patterns and the user-supplied application parameters • Supported by KOALA (with Univ. of Münster) • Initial results: up to 50% reduction in runtimes
runner The Koala Instantiation StepThe Runners • Problem: How to support many application types, each with specific (and difficult) requirements? • Solution: runners (=interface modules) • Currently supported: • Any type of single-component job • MPI/DUROC jobs • Ibis jobs • HOC applications • API for extensions: write your own!
Koala and the DAS Community • Extensive experience gathered while assessing various co-allocation policies: over 25,000 completed jobs! • Koala has been released on the DAS in Sep 2005 [ www.st.ewi.tudelft.nl/koala/] • Hands-on Tutorials (last in Spring 2006) • Documentation (web-site) • Papers • IEEE Cluster’04, Dagstuhl FGG’04, EGC’05, IEEE CCGrid’05, IEEE Cluster’06, etc. • Koala helps you get results: • IEEE CCGrid’06, others submitted
The Future of Koala • Support for more applications types, e.g., • Workflows, Parameter sweep applications • Scheduling your application? • Communication-aware and application-aware scheduling policies: • Take into account the communication pattern of applications when co-allocating • Also schedule bandwidth (in DAS3) • Support heterogeneity • DAS3 • DAS2 + DAS3 • DAS3 + Grid’5000 + RoGRID • Peer-to-peer structure instead of hierarchical grid scheduler
Outline • A Brief Introduction to Grid Computing • Koala: Processor and Data Co-Allocation in Grids • The Co-Allocation Problem in Grids • The Koala Design • Koala and the DAS Community • The Future of Koala • GrenchMark: Analyzing, Testing, and Comparing Grids • Grid Performance Evaluation Issues • The GrenchMark Architecture • GrenchMark and the DAS Community • Take home message
Grid Performance Evaluation Current Practice • Performance Indicators • Define my own metrics, or use U and AWT/ART, or both • Workload Structure • Run my own workload; Mostly all users are created equal assumption (unrealistic) • Do not make comparisons (incompatible workloads) • No repeatability of results (e.g., background load) Need a common performance evaluation framework for Grid:GrenchMark
GrenchMark: a Framework for Analyzing, Testing, and Comparing grids • What’s in a name?grid benchmark→ working towards a generic tool for the whole community: help standardizing the testing procedures, but benchmarks are too early; we use synthetic grid workloads instead • What’s it about?A systematic approach to analyzing, testing, and comparing grid settings, based on synthetic workloads • A set of metrics and workload units for analyzing grid settings [JSSPP’06] • A set of representative grid applications • Both real and synthetic • Easy-to-use tools to create synthetic grid workloads • Flexible, extensible framework
GrenchMark Overview: Easy to Generate and Run Synthetic Workloads
Workload structure User-defined and statistical models Dynamic jobs arrival Burstiness and self-similarity Feedback, background load Machine usage assumptions Users, VOs Metrics A(W) Run/Wait/Resp. Time Efficiency, MakeSpan Failure rate [!] (Grid) notions Co-allocation, interactive jobs, malleable, moldable, … Measurement methods Long workloads Saturated / non-saturated system Start-up, production, and cool-down scenarios Scaling workload to system Applications Synthetic Real Workload definition language Base language layer Extended language layer Other Can use thesame workload for both simulations and real environments … but More Complicated Than You Think
GrenchMark and the DAS community • Generic Performance Evaluation [IEEE CCGrid’06] • Grid System Analysis • Performance testing, What-if analysis • Functionality Testing in Grid Environments • System functionality testing, Periodic testing • Comparing Grid Settings • Single site vs. co-allocated jobs • Releasing the Koala Grid Scheduler on the DAS • 5,000+ jobs successfully run (in all workloads); • Functionality tests for 3 different job submission modules • GrenchMark has been released in Nov 2005 [ grenchmark.st.ewi.tudelft.nl]
Take home message • PDS Group/TU Delft - resource and test management in Grid systems • Koala: Processor and Data Co-Allocation in Grids [ www.st.ewi.tudelft.nl/koala/] - Grid scheduling with co-allocation and fault-tolerance- many placement policies available- extensible runners system- easy-to-use, flexible- tutorials, on-line documentation, papers • GrenchMark: Analyzing, Testing, and Comparing Grids[ grenchmark.st.ewi.tudelft.nl]- generic tool for the whole community- generates diverse grid workloads- easy-to-use, flexible, portable, extensible, …
Thank you! Questions? Remarks? Observations? All welcome! www.st.ewi.tudelft.nl/koala grenchmark.st.ewi.tudelft.nl/