Optimizing Workflow Scheduling Techniques Presentation

New Workflow Scheduling TechniquesPresentation: Anirban Mandal VGrADS Workshop @UCSD, Sep 2005

Outline • Drawbacks of Workflow Scheduler v.0 • Middle-Out Scheduling • Scheduling onto systems with batch queues • Scheduling onto Abstract Resource Classes Premise: Automating good application level scheduling using performance models by taking advantage of vgES features

While all available components not mapped For each (component, resource) pair ECT(c,r) = rank(c,r) + EAT(r) End For each Run min-min, max-min and sufferage Store mapping End while Top-Down Scheduling For each heuristic Until all components mapped Map available components to resources Select mapping with minimum makespan Top-Down

Drawbacks of Workflow Scheduler v.0 • Top-Down Workflow Scheduler suffers from • Myopia • Top-down traversal implies no look ahead • Potential of poor mapping of critical steps for decisions taken higher up in the workflow • Assumption of instant resource availability • Many systems have batch queue front ends • Have to wait before job starts • Scaling problems • Scheduling onto individual nodes pose scaling problems in large resource environments - Issue raised at the site-visit

Ryan’s talk Addressing the Drawbacks • We address the drawbacks as follows • Myopia • Middle-Out Scheduling • Schedule critical step first and propagate mapping up and down • Assumption of instant resource availability • Incorporating batch queue wait times to take scheduling decisions (Joint work: Rice+UCSB) • Scaling problems • Using a two-level scheduling strategy - explicit resource pruning using vgDL/other means and then scheduling (Joint Work: Rice+UCSD+Hawaii) • Scheduling onto abstract resource classes / clusters

Middle-Out Scheduling Key step Top-Down Middle-Out

Middle-Out Scheduling: Results • Compared makespans for middle-out vs. top-down scheduling • Resource set: 5 clusters [2 Opteron clusters and 3 Itanium clusters] • 6 resource-topology scenarios : combination of Opteron clusters close, normal and far with Fast and Slow Itaniums - {(Opteron close, Fast Itanium), ..} • Application: Actual EMAN DAG with 3 different communication-to-computation ratios (CCR): 0.1, 1 and 10 • Used known performance model values for computational components • Varied file sizes to obtain desired CCR for each pair of synchronization points

Middle-Out Scheduling: Results • CCR: 0.1 • Computation 10 times the communication • Fast Itanium makes top-down scheduler to “get stuck” at the Itanium clusters • Since key computation step is scheduled on both the Opteron clusters, makespan depends on the Opteron connectivity • In the slow Itanium case, the top-down scheduler “got lucky” • Gain from middle-out scheduling not much

Middle-Out Scheduling: Results • CCR: 1 • Equal communication and computation • Fast Itanium makes top-down scheduler to “get stuck” at the Itanium clusters • Since key computation step is scheduled on both the Opteron clusters, makespan depends on the Opteron connectivity • In the slow Itanium case, the top-down scheduler “got lucky”

Middle-Out Scheduling: Results • CCR: 10 • Communication 10 times the computation • Fast Itanium makes top-down scheduler to “get stuck” at the Itanium clusters • Since key computation step is scheduled on both the Opteron clusters, makespan depends on the Opteron connectivity • In the slow Itanium case, the top-down scheduler “got lucky”

Middle-Out Scheduling: Results • With increasing communication, the middle-out scheduler performs better when the top-down scheduler gets stuck

Outline • Drawbacks of Workflow Scheduler v.0 • Middle-Out Scheduling • Scheduling onto systems with batch queues • Scheduling onto Abstract Resource Classes

Scheduling onto Batch-Queue Systems • Incorporated Point-value predictions for batch queue wait times • Slight modification to the top-down scheduler • At every scheduling step, take into account the estimated time the job has to wait in the queue in the estimated completion time for the job [ECT(c,r) in the algorithm] • Keep track of the queue wait times for each cluster and the number of nodes that correspond to the queue wait time • With each mapping, update the estimated availability time [EAT in the algorithm] with the queue wait time, as required Joint work with Dan Nurmi and Rich Wolski

Scheduling onto Batch-Queue Systems: Example Cluster 1 Cluster 0 Input DAG R1 R0 R2 R3 Queue Wait Time [Cluster 0] = 20 # nodes for this wt. time = 1 Queue Wait Time [Cluster 1] = 10 # nodes for this wt. time = 2 T

Outline • Drawbacks of Workflow Scheduler v.0 • Middle-Out Scheduling • Scheduling onto systems with batch queues • Scheduling onto Abstract Resource Classes • Addressing the scaling problem • Modify scheduler to schedule onto clusters instead of individual nodes

Scheduling onto Clusters • Input: • Workflow DAG with restricted structure - nodes at the same level do the same computation • Set of available Clusters (numNodes, arch, CPU speed etc.) and inter-cluster network connectivity • Per-node performance models for each cluster • Output: • Mapping: for each level the number of instances mapped to each cluster • Objective: • Minimize makespan at each step

Scheduling onto Clusters: Modeling • Abstract modeling of mapping problem for a DAG level • Given: • N instances • M clusters • r1..rM nodes/cluster • t1..tM - rank value per node per cluster (incorporates both computation and communication) • Aim: • To find a partition (n1, n2,… nM) of N such that overall time is minimized with n1+n2+..nM = N • Analytical solution: • No ‘obvious’ solution because of the discrete nature

Scheduling onto Clusters • Iterative approach to solve the problem • Addresses the scaling issue For each instance, i from 1 to N For each cluster, j from 1 to M Tentatively map i onto j Record makespan for each j by taking care of round(j) End For each Find cluster, p with minimum makespan increase Map i to p Update round(p), numMapped(p) End For each O(#instances * #clusters)

Discussions…

Middle-Out Scheduling Key step Top-Down Middle-Out

Optimizing Workflow Scheduling Techniques Presentation

Optimizing Workflow Scheduling Techniques Presentation

Presentation Transcript

Project Scheduling Presentation

Static Scheduling Techniques

Presentation Techniques

Presentation Techniques

Techniques for truthful scheduling

A Scheduling Service Oriented Approach for Workflow Scheduling

Scheduling Presentation

New development workflow

New Workflow Manager

Presentation Techniques

NETWORK SCHEDULING TECHNIQUES

Pre-Scheduling Presentation

Grid Workflow Tools, Techniques, Applications

Scheduling Presentation 2012

COS Model: Patient Scheduling Workflow

GENI Science Shakedown Experiments Paul Ruth, Anirban Mandal , Brian Blanton, Jeffery Tilson

Presentation for Workflow Assessment

A Scheduling Service Oriented Approach for Workflow Scheduling

PRESENTATION TECHNIQUES

Presentation Techniques

Anirban Lahiri

Scheduling Presentation