Andreea Chis under the guidance of Frédéric Desprez and Eddy Caron

Scheduling for a Climate Forecast Application Andreea Chisunder the guidance of Frédéric Desprez and Eddy Caron ANR-05-CIGC-11

3 1 4 2 5 Scheduling Heuristics Introduction Simulation Results Related Works Conclusions and Future Works Contents

3 1 4 2 5 Scheduling Heuristics Introduction Experimental Results Related Works Conclusions and Future Works Contents

General Purpose • Context : global warming and climate fluctuations • Numerical simulations using general circulation models of a climate system • atmosphere • ocean • continental surfaces • Climatologists’ purpose • estimate global warming simulations’ sensitivity with respect to the model’s parameterization • Climate forecast application provided by CERFACS within the LEGO project Introduction

Our Goal • Analyze the application • Model its needs • Execution model • Data access pattern • Computing needs • Elaborate, test and compare appropriate scheduling heuristics • Provide generic scheduling schemes for applications with similar dependence graphs Introduction

Application Description • “Scenario” simulations • current climate followed by 21st century for 150 years (1800 months) • different parameterization of atmospheric model Introduction

Application Description • One monthly simulation : concatenate_atmospheric_input_files(1) modify_parameters(1) • atmospheric model (ARPEGE) • ocean and sea-ice model (OPA) • runoff pathway (TRIP) • coupler (OASIS) process_coupled_run convert_output_format(60) compress_diagonals(30) extract_minimun_information(30) Introduction

Application Description Introduction

3 1 4 2 5 Scheduling Heuristics Introduction Experimental Results Related Works Conclusions Contents

Related Works • Multiple DAGs Scheduling • Mixed Parallelism • Pipelined Data Parallel Tasks Related Works

Multiple DAGs Scheduling • Directed Acyclic Graph (DAG) • Nodes – tasks • Edges – precedence constraints • Multiple DAGs Scheduling Related Works

Multiple DAGs Scheduling • Composite DAG Related Works

Multiple DAGs Scheduling • Group DAGs’ tasks in levels of independent tasks Related Works

Related Works – Multiple DAGs Scheduling • Composite DAG and round-robin policy of scheduling among DAGs • Composite DAG & ranking based composition Related Works

Mixed Parallelism • Parallel scientific application • Data parallelism • Task parallelism • Mixed parallelism • Scheduling a DAG on a finite number of resources – NP complete even for the simple case of mono-processor tasks • Heuristic approaches Related Works

Mixed Parallelism • A. Radulescu & A. Gemund (2001) – 2 step heuristic - CPA (Critical Path and Area based Scheduling) • Processors allocation to tasks - based on a compromise between the critical path length and the processor utilization • Task allocation on processors - list scheduling heuristic Related Works

Pipelined Data Parallel Tasks • Computations consisting of a chain of data-parallel tasks that process successive data sets in a pipeline fashion – particular case of mixed parallelism • 2 key metrics to be optimized: • Latency- duration of processing a data-set • Throughput- rate at which data sets can be processed Related Works

Related Works – Pipelined Data Parallel Tasks • Aspects to be considered : • Clustering of successive stages into modules • Reduces communications • Improves latency • Replicating modules • Improves throughput • Increases latency Related Works

3 1 4 2 5 Scheduling Heuristics Introduction Experimental Results Related Works Conclusions Contents

Scheduling Heuristics • Climate Application Scheduling • Generic Scheduling Heuristics Scheduling Heuristics

Climate Application Scheduling • Homogeneous platform composed of R resources • Communication assumed contention-free through NFS • Tasks execution time is assumed to include the necessary time to • access the data • redistribute it to processors • effective computing time • store back the data Scheduling Heuristics

Main processing Post processing Climate Application Scheduling concatenate_atmosferic_input_files(1) modify_parameters(1) process_coupled_run convert_output_format(60) compress_diagonals(30) extract_minimun_information(30) Scheduling Heuristics

Climate Application Scheduling • We divide processors into disjoint sets on which multi-processor tasks can execute • All multi-processor tasks execute on the same number of resources G, defining a certain grouping of resources • For the given application, 8 possible values for the parameter G (4 →11) Scheduling Heuristics

Climate Application Scheduling • Case 1 • Case 2 Scheduling Heuristics

Climate Application Scheduling • The makespan is computed analytically as a function of • number of resources R; • grouping G ; • number of months in an independent simulation (NM) • number of independent simulations (NS). • The grouping G yielding the smallest makespan is chosen Scheduling Heuristics

Climate Application Scheduling • The constraint of scheduling all multi-processor tasks on the same number of resources is tight • Eg. R=53, NS=10, NM=1800, • found optimal grouping G = 7; • 49 resources for main processing; • 1 resource used for the corresponding post-processing • 3 resources unused. • however, 3 groups with 8 resources and 4 groups with 7 resources – 4.5% of gain Scheduling Heuristics

Climate Application Scheduling • Possibilities for improvement : • Heuristic 1 • distribute evenly the unused resources among the existing groups • Heuristic 2 • use all resources for multi-processor tasks (evenly distributing the extra-resources among processor groups) • all post-processing at the end • Heuristic 3 • use all resources for multi-processor tasks and model the problem as an instance of the knapsack problem • all post-processing at the end Scheduling Heuristics

Climate Application Scheduling • Knapsack problem modelization • Items – the 8 possibilities (groupings of resources) for allocating processors to multi-processor tasks (4 → 11) • Cost of an item – the number of resources of that grouping • Value of a grouping G – 1/T[G] – the fraction of a multi-processor task that gets executed in a time unit on G resources • Unknowns ni (i=4 → 11) – number of groups with i resources in the final solution • Constraints • Goal : maximize Scheduling Heuristics

Climate Application Scheduling Scheduling Heuristics

Generic Scheduling Heuristics • We propose generic scheduling heuristics for a class of applications consisting of independent identical chains of identical DAGs Scheduling Heuristics

Generic Scheduling Heuristics • First approach • Create a composite DAG • link all entry nodes to a common entry node and all exit tasks to a common exit node • Apply mixed parallelism scheduling heuristics on the composite DAG • CPA • reduced complexity (O(V(V+E)R)); • drawback of being a 2 step algorithm. Scheduling Heuristics

Generic Scheduling Heuristics • Second approach • Exploit the knowledge on the specific structure of the application • Exploit the pipelined structure of the application • Separate the independent pre and post-processing tasks and schedule them with algorithms for independent malleable tasks (5/4 approximation in constant time) Scheduling Heuristics

Generic Scheduling Heuristics Scheduling Heuristics

Generic Scheduling Heuristics • Heuristic 1 • Schedule all pre-processing tasks at the beginning • Schedule inter and main processing tasks as interval on the same number of resources • Schedule all post-processing tasks at the end • Heuristic 2 • Schedule all pre-processing tasks at the beginning • Schedule inter and main processing tasks separately as a pipeline • Schedule all post-processing tasks at the end Scheduling Heuristics

Generic Scheduling Heuristics • Heuristic 3 • Schedule inter and main processing tasks as an interval pipeline on the same number of resources • Schedule pre and post processing tasks simultaneously on resources specially reserved for them as well as resources unused by the pipeline • Schedule pre and post-processing tasks left at the beginning and end of pipeline respectively Scheduling Heuristics

Generic Scheduling Heuristics • Heuristic 4 • Schedule inter and main processing tasks separately as a pipeline • schedule pre and post processing tasks simultaneously with the pipeline on resources specially reserved for them as well as resources unused by the pipeline ; • schedule pre and post processing tasks left at the beginning and end of pipeline respectively; Scheduling Heuristics

3 1 4 2 5 Scheduling Heuristics Introduction Simulation Results Related Works Conclusions Contents

Simulation Results • Behavior of the 4 heuristics tested against CPA applied on the composite DAG • Tasks’ execution time modeled by Amdahl’s law: • Several configurations tested Simulation Results

Simulation Results • Configuration 1 • All tasks’ execution time on 1 processor identical (500) • All tasks’ coefficient α is identical (0.1) Simulation Results

Simulation Results • Configuration 2 • Same as before, with αinterprocessing = 0.8 Simulation Results

Simulation Results • Configuration 3 • T1pre-processing= T1post-processing=50, T1main-processing = T1inter-processing=500 • α= 0.1, αinter_processing=0.6 Simulation Results

Simulation Results • Configuration 4 • T1pre-processing= T1post-processing=50, T1main-processing = T1inter-processing=500 • α= 0.1, αinter_processing=1.0 Simulation Results

3 1 4 2 5 Scheduling Heuristics Introduction Experimental Results Related Works Conclusions and Future Works Contents

Conclusions • We found a model for the given real application • We proposed a basic heuristic for this model and 3 improved versions • We proposed 4 pipeline- based heuristics for the generalized problem and compared them with the approach of applying a mixed-parallelism algorithm on the composite DAG of the application Conclusions and Future Works

Future Works • Enhance the heuristics by taking into account a more precise communication model • Perform real experimentations on Grid’5000 in order to validate the theoretical results • Analyze other applications using a similar approach with the long term goal of deriving application dependent scheduling schemes that could finally be implemented as DIET plug-in schedulers Conclusions and Future Works

Andreea Chis under the guidance of Frédéric Desprez and Eddy Caron

Andreea Chis under the guidance of Frédéric Desprez and Eddy Caron

Presentation Transcript

God’s Way of Guidance

Eddy-Mean Flow and Eddy-Eddy Interaction: Insights from Satellite Altimetry Measurements

Types of Guidance

DIET Overview and some recent work A middleware for the large scale de ployment of applications over the Grid

Introduction to Health Data using Online Resources: Ask CHIS Workshop

Developing Guidance Skill

John Caron 5/12/2011

Server Hosting Guidance

John Caron Unidata October 2012

Day 3: Eddy Currents

Eddy Current Summary

Introduction to Erlang

DIET a Cloud Middleware for Seed4C

Eddy current modelling for ILC target