400 likes | 557 Views
Exploring Issues with Workflow Scheduling on the Grid. Rizos Sakellariou University of Manchester, UK with thanks to: Henan Zhao and Ewa Deelman for providing slides! also: Viktor Yarmolenko, Wei Zheng, … and Anastasios Gounaris for presenting it!.
E N D
Exploring Issues with Workflow Scheduling on the Grid Rizos Sakellariou University of Manchester, UK with thanks to: Henan Zhao and Ewa Deelman for providing slides! also: Viktor Yarmolenko, Wei Zheng, … and Anastasios Gounaris for presenting it!
Workflow applications are widely considered a common use case of Grids LIGO (Pegasus team, ISI) (large-scale) myGrid, Manchester (small size)
Modelling the problem… • A workflow is a Directed Acyclic Graph (DAG) • Scheduling DAGs onto resources is well studied in the context of homogeneous systems – less so, in the context of heterogeneous systems (mostly without taking into account any uncertainty). • Needless to say that this is an NP-complete problem. • Are workflows really any type of DAGs or a special type of DAGs? We don’t really know… (some workflows are clearly not DAGs – only DAGs considered here…)
DAG scheduling • An order by which tasks will be executed needs to be established (eg., red, yellow, or blue first?) • Resources need to be chosen for each task (some resources are fast, some are not so fast!) • The cost of moving data between resources should not outweigh the benefits of parallelism.
0 1 2 3 4 5 6 7 8 9 Does the order matter? • If task 6 takes comparatively longer to run, we’d like to execute task 2 just after task 0 finishes (perhaps before tasks 1, 3, 4, 5). Follow the critical path! This is not really new!
Our methodology… • Revisit the DAG scheduling problem for heterogeneous systems… • Start with simple static scenarios… • Even this problem is not well understood, despite the fact that there have been more than 30 heuristics published… (check the proceedings of the Heterogeneous Computing Workshop for a start…) • Try to build on existing knowledge, as we obtain a good understanding of each step!
Outline of Part I • Static DAG scheduling onto heterogeneous systems (i.e., we know computation & communication a priori) • Introduce uncertainty in computation times. • Handle multiple DAGs at the same time. [1] Rizos Sakellariou, Henan Zhao. A Hybrid Heuristic for DAG Scheduling on Heterogeneous Systems. Proceedings of the 13th IEEE Heterogeneous Computing Workshop (HCW’04) (in conjunction with IPDPS 2004), Santa Fe, April 2004, IEEE Computer Society Press, 2004. [2] Rizos Sakellariou, Henan Zhao. A low-cost rescheduling policy for efficient mapping of workflows on grid systems. Scientific Programming, 12(4), December 2004, pp. 253-262. [3] Henan Zhao, Rizos Sakellariou. Scheduling Multiple DAGs onto Heterogeneous Systems. Proceedings of the 15th Heterogeneous Computing Workshop (HCW'06) (in conjunction with IPDPS 2006), Rhodes, Apr. 2006, IEEE Computer Society Press.
0 18 12 9 11 14 1000 15 19 16 27 23 23 11 17 13 1 2 3 4 5 Task M1 M2 M3 0 37 39 27 1 30 20 24 2 21 21 28 6 7 8 3 35 38 31 4 27 24 29 5 29 37 20 6 22 24 30 7 37 26 37 9 8 35 31 26 9 33 37 21 The starting point for a model…A DAG, 10 tasks, 3 machines(assume we know execution times, communication costs)
0 3 1 2 5 4 6 7 8 9 A simple idea… Assign nodes to the fastest machine! Communication between nodes 4 and 8 takes way too long!!! Heuristics that take into account the whole structure of the DAG are needed… Makespan is > 1000!
Still, if we consider the whole DAG… HEFT – a minor change leads to different schedules (~15%): 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 Makespan: 143 Makespan: 164 0 0 3 3 2 5 5 1 4 4 1 2 6 7 7 8 8 6 9 9 H.Zhao,R.Sakellariou. An experimental study of the rank function of HEFT. Proceedings of EuroPar’03.
Hmm… • This was a rather well defined problem… • This was just a small change in the algorithm… • Yet, with big variations in the outcome. • What about different heuristics? • What about more generic problems?
DAG scheduling: A Hybrid Heuristic • Trying to find out why there were such differences in the outcome of HEFT…we observed problems with the order… to address those problems we came up with a Hybrid Heuristic… it worked quite well! • Phases: • Rank (list scheduling) • Create groups of independent tasks • Schedule independent tasks • Can be carried out using any scheduling algorithm for independent tasks, e.g. MinMin, MaxMin, … • A novel heuristic (Balanced Minimum Completion Time) R.Sakellariou, H.Zhao. A Hybrid Heuristic for DAG Scheduling on Heterogeneous Systems. Proceedings of the IEEE Heterogeneous Computing Workshop (HCW 04) , 2004.
Hmm… • Yes, but, so far, you have used static task execution times… in practice such times are difficult to specify exactly… • There is an answer for run-time deviations: adjust at run-time… • But: don’t we need to understand the static case first?
Characterise the Schedule • Spare time indicates the maximum time that a node, i, may delay without affecting the start time of an immediate successor, j. • Slackindicates the maximum time that a node, i, may delay without affecting the overall makespan. • The idea: keep track of the values of the slack and/or the spare time and reschedule only when the delay exceeds slack…(selective rescheduling) R.Sakellariou, H.Zhao. A low-cost rescheduling policy for efficient mapping of workflows on grid systems. Scientific Programming, 12(4), December 2004, pp. 253-262.
Example FT(4)=32.5, DAT(4,7)=40.5, ST(7)=45.5 →Spare_Time(4)=5 Slack(8)=0; Slack(7)=Slack(8)+Spare_Time(7)=0; Slack(5)=Slack(8)+Spare_Time(5)=6
Lessons Learned…(simulation and deviations of up to 100%) • Heuristics that perform better statically, perform better under uncertainty. • By using the metrics on spare time, one can track the amount of deviation of the makespan from the static estimate. Then, we can minimise the number of times we reschedule, still achieving good results.
Moving on… to multiple DAGs • It is really ideal to assume that we have exclusive usage of resources… • In practice, we may have multiple DAGs competing for resources at the same time… Henan Zhao, Rizos Sakellariou. Scheduling Multiple DAGs onto Heterogeneous Systems. Proceedings of the 15th Heterogeneous Computing Workshop (HCW'06) (in conjunction with IPDPS 2006), Rhodes, Apr. 2006, IEEE Computer Society Press.
Scheduling Multiple DAGs:Approaches • Approach 1: Schedule one DAG after the other with existing DAG scheduling algorithms • Low resource utilization & long overall makespan • Approach 2: Still one after the other, but do some backfilling and fill the gaps • Which DAG to schedule first? The one with longest makespan or the one with shortest makespan? • Approach 3: Alternate between DAGs (either round-robin or using some other form of priority). • Much better than Approach 1 & 2.
But, is makespan optimisation a good objective when scheduling multiple DAGs?
Mission: Fairness In multiple DAGs: • Users perspective: “I want my DAG to complete execution as soon as possible”. • System perspective: “I would like to keep as many users as possible happy; I would like to increase resource utilisation (and income)”. Let’s be fair to users! (The system may want to take into account different levels of quality of service agreed with each user)
Lessons Learned… Open questions… • It is possible to achieve reasonably good fairness without affecting makespan. • An algorithm with good behaviour in the static case appears to make things easier in terms of achieving fairness… • What is fairness? • What should be the behavior when run-time changes occur? • What about different notions of Quality of Service (e.g., based on SLAs…)
Questions still unanswered… • What are the representative DAGs (workflows) in the context of Grid computing? • Extensive evaluation / analysis (theoretical too) is needed. Not clear what is the best makespan we can get (it is not easy to find the critical path…) • What are the uncertainties involved? How good are the estimates that we can obtain for the execution time / communication cost? Performance prediction is hard… • How ‘heterogeneous’ our Grid resources really are?
Workflows are not generic DAGs • Bioinformatics workflows are really small (10s of nodes) • There are scientific workflows with thousands of nodes (Montage, LIGO, SCEC), but they have a rather regular structure. • Experience from joint work with the Pegasus team indicates that there may not be much to gain from sophisticated heuristics (paper to be published based on the earlier studies below) • James Blythe, S. Jain, Ewa Deelman, Yolanda Gil, Karan Vahi, Anirban Mandal, Ken Kennedy: Task scheduling strategies for workflow-based applications in grids. CCGRID 2005: 759-767 • Rizos Sakellariou, Henan Zhao. A Hybrid Heuristic for DAG Scheduling on Heterogeneous Systems. Proceedings of the 13th IEEE Heterogeneous Computing Workshop (HCW’04) (in conjunction with IPDPS 2004), Santa Fe, April 2004, IEEE Computer Society Press, 2004.
Part IIBut, there is more (than just shortening the makespan) when scheduling DAGs (workflows)!
Efficient data handling • Workflow input data is staged dynamically, new data products are generated during execution • For large workflows 10,000+ input files “Scheduling Data-Intensive Workflows onto Storage-Constrained Distributed Resources”, A. Ramakrishnan, G. Singh, H. Zhao, E. Deelman, R. Sakellariou, K. Vahi, K. Blackburn, D. Meyers, and M. Samidi, CCGrid 2007 (Similar order of intermediate/output files) • If not enough disk space: failures occur • Solution: • Determine which data are no longer needed and when • Add nodes to the workflow to cleanup data along the way • Take into account disk space onto resources • Benefits: simulations show up to 57% space improvements for LIGO-like workflows
44% Improvement in footprint for Montage workflow(when adding cleanup nodes)
LIGO Inspiral Analysis Workflow Small Workflow: 164 nodes Full Scale analysis: 185,000 nodes and 466,000 edges 10 TB of input data and 1 TB of output data LIGO workflow running on OSG “Optimizing Workflow Data Footprint” G. Singh, K. Vahi, A. Ramakrishnan, G. Mehta, E. Deelman, H. Zhao, R. Sakellariou, K. Blackburn, D. Brown, S. Fairhurst, D. Meyers, G. B. Berriman , J. Good, D. S. Katz, Scientific Programming.
LIGO Workflows 26% Improvement In disk space Usage 50% slower runtime
LIGO Workflows 56% improvement in space usage 3 times slower in runtime “Optimizing Workflow Data Footprint” G. Singh, K. Vahi, A. Ramakrishnan, G. Mehta, E. Deelman, H. Zhao, R. Sakellariou, K. Blackburn, D. Brown, S. Fairhurst, D. Meyers, G. B. Berriman , J. Good, D. S. Katz, Scientific Programming.
Lesson Learned… When scheduling workflows, one may want to trade performance with storage requirements to make it feasible to complete the execution of a workflow!
Part IIIBut, there are other issues related to performance that have to do with:the workflow execution environment and the queuing mechanisms of traditional systems!
Scheduling Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu Slide Courtesy: Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu
Execution Environment Slide Courtesy: Ewa Deelman, deelman@isi.edu www.isi.edu/~deelman pegasus.isi.edu
Queues are evil… Is Advance Reservation a solution?
Might be… For sure, there are several challenges with respect to workflows: e.g., given a user-specified deadline how can we make reservations for individual tasks? Henan Zhao, Rizos Sakellariou. Advance Reservation Policies for Workflows. Proceedings of the 12th Workshop on Job Scheduling Strategies for Parallel Processing, 2006.
Advance Reservation provides still a limited level of service! • Can we think of a model where: • users specify their constraints, • make an agreement (legally binding contract) with the resource owner (Service Level Agreement:SLA) • it’s up to the system to do the scheduling (based on the SLAs) to honour the agreement. http://www.gridscheduling.org Viktor Yarmolenko, Rizos Sakellariou. Towards Increased Expressiveness in Service Level Agreements. Concurrency and Computation: Practice and Experience, 2007. Viktor Yarmolenko, Rizos Sakellariou. An Evaluation of Heuristics for SLA-based parallel job scheduling. High Performance Grid Computing Workshop, IPDPS, 2006.
SLA based job scheduling • SLA based job scheduling can offer the levels of service currently missing: • It happens all the time in the real-world! • But, there are several key challenges to address: • Build appropriate protocols (legally binding), behaviour models, etc. for negotiation and re-negotiation • Pricing Policies (income, penalties, etc…) • Manage complexity • Regulation, monitoring, dispute resolution… • Convince the users to change attitudes! • Scheduling the SLAs doesn’t appear to be the biggest challenge… But: • How to schedule workflows using SLAs (how to deal with co-allocation problems, for instance) is a big challenge! • Needs extensive evaluation!
To summarize… • Understanding the basic static scenarios and having robust solutions for those scenarios helps the extension to more complex cases… • Pretty much everything here is addressed by heuristics. Their evaluation requires extensive experimentation: Still: • No agreement about how DAGs (workflows) look like. • No agreement about how heterogeneous resources really are. • There are indications that sophisticated DAG scheduling may not be very relevant for workflows. But, there are optimization problems that relate to: • Data handling, Licences?, Budget?, (or multiple criteria)… and, above all…
What is the way to ease the constraints imposed by the traditional queue-based models for job scheduling?
I’d be happy to hear from anyone with interests in these problems.You are also welcome to come and visit us in Manchester!