DIET Dashboard -- Motivations

DIET Dashboard -- Motivations • DIET hierarchies are designed to be deployed on grids/clusters of nodes • Users need several and complex tools for the management of resources and client/server applications. • A distributed middleware deployment is not easily manageable: • If you deal with a large amount of nodes • If you manage your resource reservations by hand • If you need to write each configuration file of your middleware by hand • If you need to launch each component of your middleware by hand

DIET Experiment workflow DIET Platform Design Resources Mapping Resources Reservation DIET Platform Deployment Experiment Results Recuperation DIET Platform Generation Workflow Design Workflow Execution

DIET Dashboard • Extensible set of tools for the DIET community • Based on seven tools: • DIET designer • DIET Mapping tool • DIET Deployment tool • XML GoDIETGenerator • Workflow designer • Workflow log service • DIET resource tool aka GRUDU DIET Tools Workflow Tools Grid Tools • The DIET DashBoard is written in Java • Provides to the DIET end-user, friendly-user interfaces to design, deploy and monitor the execution of client/server applications • Also provides to the grid user tools for the allocation and monitoring resources on Grid'5000

DIET Resources Tool • To manage grid resources used by the application • Currently only used for Grid'5000 platform. Provides several operations to facilitate the access to this platform. • Main goals: • Displaying the status of the platform (grid/site/job level) • Resources allocation through the use of OAR (v1 & v2 are supported) • Resources monitoring through the use of the Ganglia (site/job nodes) • Deployment management with a GUI for KaDeploy (multiple sites at a time) • A terminal emulator (access frontale/site frontale/job main node connection) • A file transfer manager (local/remote and synchronization features)

Grid'5000 Reservation Utility for Deployment Usage • Web: http://grudu.gforge.inria.fr

GRUDU – Resources Allocation We are able to reserve ressources (OAR1 & OAR2) Time parameters, date and reservation walltime Queue OARGrid sub behaviour/ Script to launch

GRUDU – Monitoring We are able to monitor the status of the grid/site/a job. We are able to get instantaneous/historical data with Ganglia

GRUDU - KaDeploy/JFTP GUI for KaDeploy jobs deployment File Transfert interface (local<->remote/rsync on Grid'5000)

DIET Designer/Mapping - Allows the user to design graphically a DIET hierarchy. - Only the application characteristics are defined (agent type: Master or Local and SeD parameters). - Allow the user to map DIET components ont the allocated Grid'5000 resources - The mapping is done in an interactive way by selecting the site then DIET agents or SeD.

XML GoDIET Generator • To help the end-user creating hierarchies from existing frameworks based on the reserved resources • The user will be asked to choose an experience (a framework of hierarchy) from the one available (personal hierarchies can be added) • For each hierarchy the user will have to specify the required elements involved (MA/LA/SeD) • Finally a platform will be generated and the user can deploy it through the DIET deployment tool

DIET Deployment Tool • This tool is a graphical interface to GoDIET • It provides the basic GoDIET operations: open, launch, stop and also a monitoring mechanism to check if DIET application elements still alive (three states are available: unknown, dead and running)

Workflow Designer/Log Service • Compose services to get a complete application workflow in a drag’&’drop fashion • Monitor workflows execution by displaying the DAG nodes of each workflow and their states.

Monitoring DIET experiment • Online/Offline experiment monitoring • DIET Data Management monitoring • DIET Services use/selection/etc monitoring • DIET Platform performance evaluation

Prototype Cosmo – DIET : Gantt

Prototype Cosmo – DIET : impact DIET

Large scale experiment: the DIET/Ramses case Grid’5000 • Validation of the DIET architecture at large scale over different administrative domains in the framework of the LEGO project (ANR CICG05-11) • Goal : Launch the maximum of Ramses execution (Grid based Hydro solver application developed at the DAPNIA/CEA for cosmological simulations) • Stress DIET over a large number of machine and in a large period of time • But also stress Grid'5000 ... • KaDeploy image with DIET and all the mandatory tools • 12 clusters on 7 sites : 979 machines for 48 hours • 1 MA, 12 LA, 29 SeDs • 1824 processors dedicated to Ramses

Large scale experiment on Grid’5000: • Requests submitted via DIET • 1824 processors dedicated to Ramses • 59 simulations (33 complete, 26 partial) • Equivalent to 368 days on 1 processor • GalaxyMaker & MoMaF: • Web interface for submission of parameter sweep jobs • Workload modelisation for scheduling predictions • Workflow / data management

On Going Work • Deploy DIET accross many sites • Improve Data management • Write a plug-in scheduler

Workflow

Modèle temps exécution GalaxyMaker

Modèle taille outputs GalaxyMaker

Modèle temps exécution MoMaF

Large scale experiment: the DIET/Ramses case Use of the DIET DashBoard: 20 seconds for the reservation of 979 nodes 25 minutes for the deployment with KaDeploy 23 seconds for the deployment of the DIET platform Main difficulties: Disk space on NFS storage OmniORB not available on Itanium2 Sites not available for deployment

Conclusion • DIET is a grid middleware designed for scheduling application tasks with a hierarchical architecture • The DIET DashBoard provides to DIET users: • A full-featured framework for experiments • An easy way to manage Grid'5000 • The DIET Resources Tool provides to the Grid'5000 community a powerful tool dedicated to the interaction with the grid: • Monitoring • Reservation • Deployment • etc. • The DIET Resources tool exists in a stand alone version known as GRUDU dedicated to the Grid'5000 community

Future Work • Web-based version of the DIET DashBoard • Used on the Decrypthon project: WebBoard • GUI for client/server applications design • DIET Data Management interface • Support of other Batch Schedulers (such as LoadLeveler or SGE) • Plugin based architecture‏

Introduction - Context • Climate evolution • Global Warming Effect • Two problems • Long term evolution (need super-computer) • Climate model parametrization (need numerous simulations)

Introduction - Motivations • The project aims to study the parametrization sensitivity of a climate model • A better understanding of parametrization will provide better simulations • Once good parameters have been found, we will have the possibility to simulate the climate further in the future • Need to perform numerous independent simulations • The focus of this talk is the minimization of the execution time of these independent simulations

Outline • Introduction • Framework • Ocean-Atmosphere Application • Grid’5000 • Diet • Scheduling Strategies • Experimental Results • Conclusion & Future Work

Ocean-Atmosphere scenarios ….. Month 1 Month 2 Month 1799 Month 1800 • Climate simulation over the 21st century • An experiment is composed of several scenarios • A scenario is a chain of 1800 monthly simulations (150 years) • Input of (n+1)th monthly simulation is the output of thenth one • The scenarios are independent. A scenario

Ocean-Atmosphere running A monthly simulation Post-processing task Main-task 1 60 >1200 60 60 1 Parallel task (4 to 11 processors)

Software environment E. Caron - Ocean-Atmosphere scheduling within DIET - APDCT-08 • GridRPC compliant for interoperability • Client/Agent/Server paradigm • Middleware with a hierarchical architecturedesigned to provide scalability • Resource finding for the client • Plug-in scheduler with hierarchicalbehavior • Data management • with replication • Easy to deploy • Easy to use

Platform environment: Grid’5000 • Congregation of resources • Composed of numerous clusters distributed over 9 sites all over France • All nodes of a cluster have access to a NFS to store data • Possibility to deploy its own system image on nodes • Well suited to execute our independent scenarios

Outline • Introduction • Framework • Scheduling Strategies • Cluster Level Scheduling • Grid Level Scheduling • Experimental Results • Conclusion & Future Work

Scheduling Strategies • We use Grid’5000 as an experiment platform • The platform is composed of several heterogeneous clusters • Each cluster is homogeneous internally • We use Diet to perform the scheduling • Send request • Performance prediction (makespan) • Distribution of scenarios • Computation • Experiment end Cluster 1 Diet hierarchy Cluster 2 Client Cluster 3

Cluster Level Scheduling (1/5) • We consider an homogeneous platform composed of R resources (processors) • We have NS scenarios • Execution times take into account the time to get the data, make the computation and store the results • T[i] is the time needed to execute a main-task on i processors • All post-processing tasks are left at the end of the execution because of main-tasks good speedup • If there are too much resources, the post-processing tasks will be executed at the same time

Cluster Level Scheduling (2/5) • Clusters are heterogeneous • T[i] on 5 clusters of Grid’5000

Cluster Level Scheduling (3/5) • We need to find the grouping of processors leading to the best makespan • Find ni (number of groups with i resources) such that: • The portion of code executed at each time step is maximized • We have no more than NS groups and use less that R resources

Cluster Level Scheduling (4/5) E. Caron - Ocean-Atmosphere scheduling within DIET - APDCT-08 • Example of grouping: 3 groups (4, 4 and 7 processors) • Fairness among scenarios: when a group becomes idle, the task of the less advanced scenario is scheduled Cluster c Resources (processors) Scenario 1 Scenario 2 Scenario 3 Time Scenario 4

Cluster Level Scheduling (5/5) E. Caron - Ocean-Atmosphere scheduling within DIET - APDCT-08 • Every resource is taken into account • Makespan is strictly decreasing when adding more resources • The decrease rate of the makespan diminishes

Grid Level Scheduling (1/2) E. Caron - Ocean-Atmosphere scheduling within DIET - APDCT-08 • Aim: reduce makespan by distributing the NS scenarios among nbClusters clusters • When performance prediction is performed, the makespans from 1 to NS scenarios on cluster C are send to the client (performance[C]) • Algorithm complexity: O(NS × nbClusters) • One experiment: NS = 10 and nbClusters is small on Grid’5000 (≈20) makespan = 0 initialize number of scenarios on each cluster to 0 while there are scenarios to schedule do find cluster C where makespan increases the less increment NSC the number of scenarios on C update makespan with performance[C][NSC] endwhile send scenarios to SeDs

Grid Level Scheduling (2/2) E. Caron - Ocean-Atmosphere scheduling within DIET - APDCT-08 • Comparison with Round Robin on 5 clusters • Maximum speedup (25%): equal to the speedup when executing one main-task on the slowest and the fastest cluster • With a higher load, the algorithm behaves better with a few resources • Convergence on gains • Gain of 25% ≈ 230h on a ≈ 822h long experiment

Outline • Introduction • Framework • Scheduling Strategies • Experimental Results • Conclusion & Future Work

Experimental Results (1/2) E. Caron - Ocean-Atmosphere scheduling within DIET - APDCT-08 • Because of technical limitations, no more than one scenario can be executed on a single node • All nodes on Grid’5000 are bi-cores or quad-cores • New constraint: the size of a group has to be divisible by the number of cores per node of the cluster • Possibility to make groups of 12 processors to reduce loss • Loss due to this technical difficulty: • Few resources: loss between 1% and 13% • More resources: loss between 1% and 5% • Lot of resources: no more loss

Experimental Results (2/2) • Accuracy of simulations on 7 experiments • Bad with all post-processing tasks at the end (20.8% difference) • Good if we consider only main-tasks (6.3% difference) • Keeping a resource to execute post-processing tasks during experiment suppresses the simulations inaccuracy • Positive difference means the real execution was slower than expected

Outline • Introduction • Framework • Scheduling Strategies • Experimental Results • Conclusion & Future Work

Conclusion E. Caron - Ocean-Atmosphere scheduling within DIET - APDCT-08 • Improve performances in a climate prediction application • Modelization of the application • Proof of usage of Grid’5000 and Diet • Scheduling on real application • Scheduling done at two levels • Groups of processors at cluster level • Distribution of scenarios at grid level • Real implementation suffered from technical limitations • Simulations are quite precise but we need to keep one resource for post-processing tasks

Future Work • Extension of this work to generic independent chains of Dags composed of moldable tasks • Resource reservation is done manually, so we want to use tools such as SimGrid/SimBatch to determine how many resources to reserve and then, use the SeDBatch to make the reservation automatically

DIET Dashboard -- Motivations

DIET Dashboard -- Motivations

Presentation Transcript

MOTIVATIONS

Motivations

Motivations :

Motivations

Motivations

Motivations

MOTIVATIONS

Motivations

Motivations

Motivations

Motivations

Motivations

Motivations

Motivations

Motivations

Motivations

Motivations…

Motivations

Motivations