350 likes | 470 Views
The Failure Trace Archive. Grid Computing: From Old Traces to New Applications. Alexandru Iosup , Ozan Sonmez, Nezih Yigitbasi, Hashim Mohamed, Catalin Dumitrescu, Mathieu Jan, Dick Epema. Parallel and Distributed Systems Group, TU Delft.
E N D
The FailureTraceArchive Grid Computing: From Old Traces to New Applications Alexandru Iosup, Ozan Sonmez, Nezih Yigitbasi, Hashim Mohamed, Catalin Dumitrescu, Mathieu Jan, Dick Epema Parallel and Distributed Systems Group, TU Delft Big thanks to our collaborators: U Wisc./Madison, U Chicago, U Dortmund, U Innsbruck, LRI/INRIA Paris, INRIA Grenoble, U Leiden, Politehnica University of Bucharest, Technion, … DGSim Fribourg, Switzerland
About the Speaker • Systems • The Koala grid scheduler • The Tribler BitTorrent-compatible P2P file-sharing • The POGGI and CAMEO gaming platforms • Performance • The Grid Workloads Archive (Nov 2006) • The Failure Trace Archive (Nov 2009) • The Peer-to-Peer Trace Archive (Apr 2010) • Tools: DGSim trace-based grid simulator, GrenchMark workload-based grid benchmarking • Team of 15+ active collaborators in NL, AT, RO, US • Happy to be in Berkeley until September
The Grid An ubiquitous, always-on computational and data storage platform on which users can seamlessly run their (large-scale) applications Shared capacity & costs, economies of scale
The Dutch Grid: DAS System and Extensions UvA/MultimediaN (46) VU (85 nodes) DAS-3: a 5-cluster grid • 272 AMD Opteron nodes 792 cores, 1TB memory • Heterogeneous: • 2.2-2.6 GHz single/dual core nodes • Myrinet-10G (excl. Delft) • Gigabit Ethernet SURFnet6 UvA/VL-e (41) Clouds 10 Gb/s lambdas • Amazon EC2+S3, Mosso, … DAS-4 (upcoming) • Multi-cores: general purpose, GPU, Cell, … Leiden (32) TU Delft (68)
Many Grids Built DAS, Grid’5000, OSG, NGS, CERN, … Why grids and not The Grid?
Agenda • Introduction • Was it the System? • Was it the Workload? • Was it the System Designer? • New Application Types • Suggestions for Collaboration • Conclusion
The Failure Trace ArchiveFailure and Recovery Events http://fta.inria.fr 20+ traces online D. Kondo, B. Javadi, A. Iosup, D. Epema, The Failure Trace Archive: Enabling Comparative Analysis of Failures in Diverse Distributed Systems, CCGrid 2010 (Best Paper Award)
System Availability CharacteristicsResource Evolution: Grids Grow by Cluster
System Availability CharacteristicsGrid Dynamics: Grids Shrink Temporarily Average availability:69% Grid-level view
Resource Availability Model • Assume no correlation of failure occurrence between clusters • Which site/cluster? • fs, fraction of failures at cluster s MTBF MTTR Correl. • Weibull distribution for IAT • the longer a node is online, the higher the chances that it will fail
Was it the System? • No • System can grow fast • Good data and models to support system designers • Yes • Grid middleware unscalable • Grid middleware failure-prone • Grid resources unavailable • Poor online information about resource availability
Agenda • Introduction • Was it the System? • Was it the Workload? • Was it the System Designer? • Was it Another Hype? • Suggestions for Collaboration • Conclusion
The Grid Workloads ArchivePer-Job Arrival, Start, Stop, Structure, etc. http://gwa.ewi.tudelft.nl 6 traces online 1.5 yrs >750K >250 A. Iosup, H. Li, M. Jan, S. Anoep, C. Dumitrescu, L. Wolters, D. Epema, The Grid Workloads Archive, FGCS 24, 672—686, 2008.
How Are Real Grids Used? Data Analysis and Modeling • Grids vs. parallel production environments such as clusters and (small) supercomputers • Bags of single-processor tasks vs. single parallel jobs • Bigger bursts of job arrivals • More jobs Grid Systems Parallel production environments
Grid WorkloadsAnalysis: Grid Workload Components Workflows (WFs) Bags-of-Tasks (BoTs) Time [units] • BoT size = 2-70 tasks, most 5-20 • Task runtime highly variable, from minutes to tens of hours • WF size = 2-1k tasks, most 30-40 • Task runtime of minutes
Grid WorkloadsModeling Grid Workloads: adding users, BoTs • Single arrival process for both BoTs and parallel jobs • Reduce over-fitting and complexity of “Feitelson adapted” by removing the RunTime-Parallelism correlated model • Validated with 7 grid workloads A. Iosup, O. Sonmez, S. Anoep, and D.H.J. Epema. The Performance of Bags-of-Tasks in Large-Scale Distributed Systems, HPDC, pp. 97-108, 2008.
Grid WorkloadsLoad Imbalance Across Sites and Grids • Overall workload imbalance: normalized daily load (5:1) • Temporary workload imbalance: hourly load (1000:1) Overall imbalance Temporary imbalance
Was it the Workload? • No • Similar workload characteristics across grids • High utilization possible due to single-node jobs • High load imbalance • Good data and models to support system designers • Yes • Too many tasks (system limitation) • Poor online information about job characteristics +High variability of job resource requirements • How to schedule BoTs, WFs, mixtures in grids?
Agenda • Introduction • Was it the System? • Was it the Workload? • Was it the System Designer? • Was it Another Hype? • Suggestions for Collaboration • Conclusion
Problems in Grid Scheduling and Resource Management The System Grid schedulers do not own resources themselves They have to negotiate with autonomous local schedulers Authentication/multi-organizational issues Grid schedulers interface to local schedulers Some may have support for reservations, others are queuing-based Grid resources are heterogeneous and dynamic Hardware (processor architecture, disk space, network) Basic software (OS, libraries) Grid software (middleware) Resources may fail Lack of complete and accurate resource information
Problems in Grid Scheduling and Resource Management The Workloads Workloads may be heterogeneous and dynamic Grid schedulers may not have control over the full workload (multiple submission points) Jobs may have performance requirements Lack of complete and accurate job information Application structure may be heterogeneous Single sequential job Bags of Tasks; parameter sweeps (Monte Carlo), pilot jobs Workflows, pipelines, chains-of-tasks Parallel jobs (MPI); malleable, coallocated
The Koala Grid Scheduler Developed in the DAS system Has been deployed on the DAS-2 in September 2005 Ported to DAS-3 in April 2007 Independent from grid middlewares such as Globus Runs on top of local schedulers Objectives: Data and processor co-allocation in grids Supporting different application types Specialized application-oriented scheduling policies Koala homepage: http://www.st.ewi.tudelft.nl/koala/
Koala in a Nutshell A bridge between theory and practice • Parallel Applications • MPI, Ibis,… • Co-Allocation • Malleability • Parameter Sweep Applications • Cycle Scavenging • Run as low-priority jobs • Workflows
How To Compare Existing and New Grid Systems?The Delft Grid Simulator (DGSim) Generate realistic workloads Automate simulation process (10,000s of tasks) Discrete event generator DGSim …tudelft.nl/~iosup/dgsim.php
OAR Condor Koala Globus GRAM Alien Independent Centralized Condor Flocking OAR2 OurGrid Moab/Torque NWIRE CCS Decentralized Hierarchical How to Inter-Operate Grids?Existing (, Working?) Alternatives Load imbalance? Resource selection? Scale? Root ownership? Node failures? Accounting? Trust? Scale?
2 3 3 3 3 3 3 Inter-Operating Grids Through Delegated MatchMaking [1/3]The Delegated MatchMaking Architecture • Start from a hierarchical architecture • Let roots exchange load • Let siblings exchange load Delegated MatchMaking Architecture=Hybrid hierarchical/decentralized architecture for grid inter-operation
Resource request Local load too high Bind remote resource Delegate Resource usage rights Inter-Operating Grids Through Delegated MatchMaking [3/3] The Delegated MatchMaking Mechanism • Deal with local load locally (if possible) • When local load is too high, temporarily bind resources from remote sites to the local environment. • May build delegation chains. • Delegate resource usage rights, do not migrate jobs. • Deal with delegations each delegation cycle (delegated matchmaking) The Delegated MatchMaking Mechanism=Delegate Resource Usage Rights, Do Not Delegate Jobs
DMM Decentralized Centralized Independent What is the Potential Gain of Grid Inter-Operation?Delegated MatchMaking vs. Alternatives (Higher is better) • DMM • High goodput • Low wait time • Finishes all jobs • Even better for load imbalance between grids • Reasonable overhead • [see thesis] Grid Inter-Operation (through DMM)delivers good performance
4.2. Studies on Grid Scheduling [5/5]Scheduling under Cycle Stealing monitors/informs idle/demanded resources • CS Policies: • Equi-All: • grid-wide basis • Equi-PerSite: • per cluster Head Node Scheduler KCM Node • Requirements • Unobtrusiveness Minimal delay for (higher priority) local and grid jobs • Fairness 3. Dynamic Resource Allocation 4. Efficiency 5. Robustness and Fault Tolerance grow/shrink messages submits launchers registers Launcher submits PSA(s) CS-Runner Launcher deploys, monitors, and preempts tasks JDF Clusters • Application Level Scheduling: • Pull-based approach • Shrinkage policy Deployed as Koala Runner O. Sonmez, B. Grundeken, H. Mohamed, A. Iosup, D. Epema: Scheduling Strategies for Cycle Scavenging in Multicluster Grid Systems. CCGRID 2009: 12-19
Was it the System Designer? • No • Mechanisms to inter-operate grids: DMM [SC|07], … • Mechanisms to run many grid application types: WFs, BoTs, parameter sweeps, cycle scavenging, … • Scheduling algorithms with inaccurate information [HPDC ‘08, ‘09, ‘10] • Tools for empirical and trace-based experimentation • Yes • Still too many tasks • What about new application types?
Agenda • Introduction • Was it the System? • Was it the Workload? • Was it the System Designer? • New Application Types • Suggestions for Collaboration • Conclusion
Sources: MMOGChart, own research. Sources: ESA, MPAA, RIAA. MSGs are a Popular, Growing Market • 25,000,000 subscribed players (from 150,000,000+ active) • Over 10,000 MSGs in operation • Market size 7,500,000,000$/year
Romeo and Juliet Massively Social Gaming as New Grid/Cloud Application Massively Social Gaming (online) games with massive numbers of players (100K+), for which social interaction helps the gaming experience [SC|08, TPDS’10] • Virtual worldExplore, do, learn, socialize, compete+ • ContentGraphics, maps, puzzles, quests, culture+ • Game analyticsPlayer stats and relationships [EuroPar09 BPAward, CPE10] [ROIA09]
Suggestions for Collaboration • Scheduling mixtures of grid/HPC/cloud workloads • Scheduling and resource management in practice • Modeling aspects of cloud infrastructure and workloads • Condor on top of Mesos • Massively Social Gaming and Mesos • Step 1: Game analytics and social network analysis in Mesos • The Grid Research Toolbox • Using and sharing traces: The Grid Workloads Archive and The Failure Trace Archive • GrenchMark: testing large-scale distributed systems • DGSim: simulating multi-cluster grids
Thank you! Questions? Observations? Alex Iosup, Ozan Sonmez, Nezih Yigitbasi, Hashim Mohamed, Dick Epema email: A.Iosup@tudelft.nl • More Information: • The Koala Grid Scheduler: www.st.ewi.tudelft.nl/koala • The Grid Workloads Archive: gwa.ewi.tudelft.nl • The Failure Trace Archive: fta.inria.fr • The DGSim simulator: www.pds.ewi.tudelft.nl/~iosup/dgsim.php • The GrenchMark perf. eval. tool: grenchmark.st.ewi.tudelft.nl • Cloud research: www.st.ewi.tudelft.nl/~iosup/research_cloud.html • Gaming research: www.st.ewi.tudelft.nl/~iosup/research_gaming.html • see PDS publication database at: www.pds.twi.tudelft.nl/ DGSim Big thanks to our collaborators: U. Wisc.-Madison, U Chicago, U Dortmund, U Innsbruck, LRI/INRIA Paris, INRIA Grenoble, U Leiden, Politehnica University of Bucharest, Technion, …