180 likes | 312 Views
Stochastic DAG Scheduling using Monte Carlo Approach. Heterogeneous Computing Workshop (at IPDPS) 2012 Extended version: Elsevier JPDC (accepted July 2013, in Press) Wei Zheng Department of Computer Science, Xiamen University, Xiamen, China Rizos Sakellariou
E N D
Stochastic DAG Scheduling using Monte Carlo Approach Heterogeneous Computing Workshop (at IPDPS) 2012 Extended version: Elsevier JPDC (accepted July 2013, in Press) Wei Zheng Department of Computer Science, Xiamen University, Xiamen, China RizosSakellariou SchoolofComputerScience,TheUniversityofManchester,UK
Previous Presentation (9/06/13) • Research Area: Scheduling workflows under heterogeneous environment with variable performance.
Introduction • General DAG Scheduling assumption: • Estimated Execution time for each task is known in advance. • Several techniques of estimation: e.g. average over several runs • Similarly, estimated data transfer time is known in advance. • A study* has shown, there might be significant deviations in observed performance in Grids. • To address this deviations, Two approaches are prevalent • Just-In-Time (high overhead) • RunTime (static schedule + runtime changes) (hypothesis**: might waste resources and increase makespan if static schedule is not very good) • * A. Lastovetsky, J. Twamley, Towards a realistic performance model for networks of heterogeneous computers, in:M.Ng,A.Doncescu,L.Yang,T.Leng (Eds.), High Performance Computational Science and Engineering, in: IFIP InternationalFederationforInformationProcessing,vol.172,Springer,Boston, 2005,pp.39–57. • ** R.Sakellariou,H.Zhao,A low-cost rescheduling policy for efficient mapping of workflows on grid systems, Sci. Program. 12(4) (2004) 253–262
Problem Addressed • Generating a better (minimize makespan) “Static” schedule based on the stochastic model of the variations in the performance (execution time) of individual tasks in the graph.
Background and Related Work • Heterogeneous Earliest Finish Time heuristic (discussed in the previous presentation) • List based scheduling. • Prioritize tasks based on the “bLevel” (essentially, tasks on the critical path get higher priority) • Once task is chosen, map it to “best” available resource. bLevel(i) = wi + max j∈Succ(i){wi→j +bLevel(j)}
Problem Description • G = (N, E) -> DAG with one entry, one exit node. • R -> set of heterogeneous resources • Eti,p-> Random variable for execution time • Assumption: Network bandwidth is constant. • M -> Makespan = finish time of exit node. Goal: Find schedule Ω to minimize makespan (assign N to R, no overlap, no preemption, no migration)
Methodology • Assumption: Analytical methods that solve the probabilistic optimization problem are too expensive. • Use Monte Carlo Sampling (MCS) method. • Define a space comprising possible input values • IG ={ETi,p :i∈N,p∈R}. • Take an independent sample randomly from the space • PG =fsmp(IG) ={ti,p :i∈N,p∈R} • Perform deterministic computation using the sample input (store the result) • ΩG =Static_SchedulingHEFT(G,PG) • Repeat 2 and 3 till some exit condition (no. of repetitions) • Aggregate the stored results of the individual computations into the final result.
MCS Based Scheduling • Complexity: • Depends on the deterministic scheduling algorithm • For HEFT it is O(v + e * r) = O(e*r) • First loop: O(e*r*m) • Second loop: O(e * n * k) • Total = O(e*r*m + e*n*k)
Example 10,000 iterations - production phase (Gaussian Distribution) 200 iterations - selection phase 20% reduction in makespan Absolute increase in algorithm time: 1.2s
Evaluation • Graphs
Makespan performance evaluation • Static HEFT (baseline) with Mean ET values • Autopsy – Static HEFT With known ET values • MCS - Static • ReStatic • ReMCS • Graph Generation (random generator of given type) • Task Execution Time for different runs • Select “Mean” for each task. • Use a probability distribution to select actual execution time. The variation is bounded by Quality of Estimation (QoE) (0<QoE<1)
Summary • It is possible to obtain a good full-ahead static schedule that performs well under prediction inaccuracy, without too much overhead. • MCS, which has a more robust procedure for selecting an initial schedule, generally results in better performance when rescheduling is applied