1 / 21

New Workflow Scheduling Techniques Presentation: Anirban Mandal

New Workflow Scheduling Techniques Presentation: Anirban Mandal. VGrADS Workshop @UCSD, Sep 2005. Outline. Drawbacks of Workflow Scheduler v.0 Middle-Out Scheduling Scheduling onto systems with batch queues Scheduling onto Abstract Resource Classes.

briggsm
Download Presentation

New Workflow Scheduling Techniques Presentation: Anirban Mandal

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. New Workflow Scheduling TechniquesPresentation: Anirban Mandal VGrADS Workshop @UCSD, Sep 2005

  2. Outline • Drawbacks of Workflow Scheduler v.0 • Middle-Out Scheduling • Scheduling onto systems with batch queues • Scheduling onto Abstract Resource Classes Premise: Automating good application level scheduling using performance models by taking advantage of vgES features

  3. While all available components not mapped For each (component, resource) pair ECT(c,r) = rank(c,r) + EAT(r) End For each Run min-min, max-min and sufferage Store mapping End while Top-Down Scheduling For each heuristic Until all components mapped Map available components to resources Select mapping with minimum makespan Top-Down

  4. Drawbacks of Workflow Scheduler v.0 • Top-Down Workflow Scheduler suffers from • Myopia • Top-down traversal implies no look ahead • Potential of poor mapping of critical steps for decisions taken higher up in the workflow • Assumption of instant resource availability • Many systems have batch queue front ends • Have to wait before job starts • Scaling problems • Scheduling onto individual nodes pose scaling problems in large resource environments - Issue raised at the site-visit

  5. Ryan’s talk Addressing the Drawbacks • We address the drawbacks as follows • Myopia • Middle-Out Scheduling • Schedule critical step first and propagate mapping up and down • Assumption of instant resource availability • Incorporating batch queue wait times to take scheduling decisions (Joint work: Rice+UCSB) • Scaling problems • Using a two-level scheduling strategy - explicit resource pruning using vgDL/other means and then scheduling (Joint Work: Rice+UCSD+Hawaii) • Scheduling onto abstract resource classes / clusters

  6. Middle-Out Scheduling Key step Top-Down Middle-Out

  7. Middle-Out Scheduling: Results • Compared makespans for middle-out vs. top-down scheduling • Resource set: 5 clusters [2 Opteron clusters and 3 Itanium clusters] • 6 resource-topology scenarios : combination of Opteron clusters close, normal and far with Fast and Slow Itaniums - {(Opteron close, Fast Itanium), ..} • Application: Actual EMAN DAG with 3 different communication-to-computation ratios (CCR): 0.1, 1 and 10 • Used known performance model values for computational components • Varied file sizes to obtain desired CCR for each pair of synchronization points

  8. Middle-Out Scheduling: Results • CCR: 0.1 • Computation 10 times the communication • Fast Itanium makes top-down scheduler to “get stuck” at the Itanium clusters • Since key computation step is scheduled on both the Opteron clusters, makespan depends on the Opteron connectivity • In the slow Itanium case, the top-down scheduler “got lucky” • Gain from middle-out scheduling not much

  9. Middle-Out Scheduling: Results • CCR: 1 • Equal communication and computation • Fast Itanium makes top-down scheduler to “get stuck” at the Itanium clusters • Since key computation step is scheduled on both the Opteron clusters, makespan depends on the Opteron connectivity • In the slow Itanium case, the top-down scheduler “got lucky”

  10. Middle-Out Scheduling: Results • CCR: 10 • Communication 10 times the computation • Fast Itanium makes top-down scheduler to “get stuck” at the Itanium clusters • Since key computation step is scheduled on both the Opteron clusters, makespan depends on the Opteron connectivity • In the slow Itanium case, the top-down scheduler “got lucky”

  11. Middle-Out Scheduling: Results • With increasing communication, the middle-out scheduler performs better when the top-down scheduler gets stuck

  12. Outline • Drawbacks of Workflow Scheduler v.0 • Middle-Out Scheduling • Scheduling onto systems with batch queues • Scheduling onto Abstract Resource Classes

  13. Scheduling onto Batch-Queue Systems • Incorporated Point-value predictions for batch queue wait times • Slight modification to the top-down scheduler • At every scheduling step, take into account the estimated time the job has to wait in the queue in the estimated completion time for the job [ECT(c,r) in the algorithm] • Keep track of the queue wait times for each cluster and the number of nodes that correspond to the queue wait time • With each mapping, update the estimated availability time [EAT in the algorithm] with the queue wait time, as required Joint work with Dan Nurmi and Rich Wolski

  14. Scheduling onto Batch-Queue Systems: Example Cluster 1 Cluster 0 Input DAG R1 R0 R2 R3 Queue Wait Time [Cluster 0] = 20 # nodes for this wt. time = 1 Queue Wait Time [Cluster 1] = 10 # nodes for this wt. time = 2 T

  15. Scheduling onto Batch-Queue Systems: Example Cluster 1 Cluster 0 Input DAG R1 R0 R2 R3 Queue Wait Time [Cluster 0] = 20 # nodes for this wt. time = 1 Queue Wait Time [Cluster 1] = 10 # nodes for this wt. time = 2 T

  16. Outline • Drawbacks of Workflow Scheduler v.0 • Middle-Out Scheduling • Scheduling onto systems with batch queues • Scheduling onto Abstract Resource Classes • Addressing the scaling problem • Modify scheduler to schedule onto clusters instead of individual nodes

  17. Scheduling onto Clusters • Input: • Workflow DAG with restricted structure - nodes at the same level do the same computation • Set of available Clusters (numNodes, arch, CPU speed etc.) and inter-cluster network connectivity • Per-node performance models for each cluster • Output: • Mapping: for each level the number of instances mapped to each cluster • Objective: • Minimize makespan at each step

  18. Scheduling onto Clusters: Modeling • Abstract modeling of mapping problem for a DAG level • Given: • N instances • M clusters • r1..rM nodes/cluster • t1..tM - rank value per node per cluster (incorporates both computation and communication) • Aim: • To find a partition (n1, n2,… nM) of N such that overall time is minimized with n1+n2+..nM = N • Analytical solution: • No ‘obvious’ solution because of the discrete nature

  19. Scheduling onto Clusters • Iterative approach to solve the problem • Addresses the scaling issue For each instance, i from 1 to N For each cluster, j from 1 to M Tentatively map i onto j Record makespan for each j by taking care of round(j) End For each Find cluster, p with minimum makespan increase Map i to p Update round(p), numMapped(p) End For each O(#instances * #clusters)

  20. Discussions…

  21. Middle-Out Scheduling Key step Top-Down Middle-Out

More Related