310 likes | 496 Views
2nd “Scheduling in Aussois” Workshop May 18-21, 2008. Multi-Objective Scheduling of Streaming Workflows. Bi-criteria Scheduling of Streaming Workflows. Naga Vydyanathan 1 , Umit V. Catalyurek 2,3 , Tahsin Kurc 2 , P. Sadayappan 1 and Joel Saltz 1,2
E N D
2nd “Scheduling in Aussois” Workshop May 18-21, 2008 Multi-ObjectiveScheduling of Streaming Workflows Bi-criteriaScheduling of Streaming Workflows Naga Vydyanathan 1, Umit V. Catalyurek2,3, Tahsin Kurc 2, P. Sadayappan 1 and Joel Saltz 1,2 1 Dept. of Computer Science & Engineering 2 Dept. of Biomedical Informatics 3 Dept. of Electrical & Computer Engineering The Ohio State University
Current and Emerging Applications Satellite Data Processing High Energy Physics Quantum Chemistry DCE-MRI Analysis Image Processing Multimedia Video Surveillance Montage Umit Catalyurek "Scheduling of Streaming Workflows"
Bag-of-Tasks Applications Workflows Non-streaming Streaming Bag-of-Tasks Model Non-streaming Workflows Streaming Workflows Challenges • Complex and diverse processing structures Data Analysis Applications Task File Sequential or Parallel Task Umit Catalyurek "Scheduling of Streaming Workflows"
Bag-of-Tasks Applications P1 P2 P3 P4 Sequential Task File Task-parallelism Challenges • Complex and diverse processing structures • Varied parallelism Umit Catalyurek "Scheduling of Streaming Workflows"
Challenges • Complex and diverse processing structures • Varied parallelism • Bag-of-tasks applications: task-parallelism Umit Catalyurek "Scheduling of Streaming Workflows"
Non-streaming Workflows P1 P2 P3 P4 Task-parallelism Data-parallelism Sequential or Parallel Task Challenges • Complex and diverse processing structures • Varied parallelism Umit Catalyurek "Scheduling of Streaming Workflows"
Challenges • Complex and diverse processing structures • Varied parallelism • Bag-of-tasks: task-parallelism • Non-streaming workflows: task- and data-parallelism Umit Catalyurek "Scheduling of Streaming Workflows"
Streaming Workflows P1 P2 P3 P4 Pipelined-parallelism Data-parallelism Task-parallelism Sequential or Parallel Task Challenges • Complex and diverse processing structures • Varied parallelism Umit Catalyurek "Scheduling of Streaming Workflows"
Challenges • Complex and diverse processing structures • Varied parallelism • Bag-of-tasks: task-parallelism • Non-streaming workflows: task- and data-parallelism • Streaming workflows: task-, data- and pipelined-parallelism Umit Catalyurek "Scheduling of Streaming Workflows"
Challenges • Different performance criteria • Bag-of-tasks: batch execution time [CCGrid’05, HCW’05, JSSPP’06, HPDC’06] • Non-streaming workflows: makespan [ICPP’05, HCW’06, ICPP’06, Cluster’06] • Streaming workflows: latency, throughput [SC’02, EuroPar’07, ICPP’08] • Significant communication/data transfer overheads Umit Catalyurek "Scheduling of Streaming Workflows"
Scheduling Streaming Workflows Data Analysis Applications Bag-of-Tasks Applications Workflows Non-streaming Streaming Umit Catalyurek "Scheduling of Streaming Workflows"
Scheduling Streaming Workflows • Image processing, multimedia, computer vision applications often act on a stream of input data • Scheduling challenges • Multiple performance criteria • Latency (time to process one data item) • Throughput (aggregate rate of processing) • Multiple forms of parallelism • Pipelined parallelism • Task parallelism • Data parallelism Umit Catalyurek "Scheduling of Streaming Workflows"
P1 P2 P3 P4 P5 P6 Processors Time An Example Pipelined Schedule 10 T1 8 12 T4(2) T4(k-2) T2 T3 T3 T4(1) T3(2) T3(2) 15 T4 T4 T3(1) T3(1) T3(1) T3(3) T3(k-1) T2(1) T2(1) T2(1) T2(2) T2(3) T2(k-1) Throughput=0.1 Latency=37 T1(1) T1(2) T1(2) T1(3) T1(4) T1(k) Pipelined Parallelism Data Parallelism Task Parallelism Umit Catalyurek "Scheduling of Streaming Workflows"
Optimizing Latency while meeting Throughput Constraints • Given: • A workflow DAG with runtime and data volume estimates • A collection of homogeneous processors • A throughput constraint • Goal: • Generate a schedule that meets the throughput constraint while minimizing workflow latency This requires leveraging pipelined, task and data parallelism in a co-ordinated manner Umit Catalyurek "Scheduling of Streaming Workflows"
Pipelined Scheduling Heuristic • Three-phase approach • Phase 1: Satisfying the throughput requirement • Assumes unbounded number of processors • Employs clustering, replication and duplication to meet throughput requirement • Phase 2: Limiting the number of processors used • Merges task clusters to reduce the number of processors used until a feasible schedule is obtained • Preference given to decisions that minimize latency • Phase 3: Minimizing the workflow latency • Minimizes communication costs along the critical path by duplication and clustering Umit Catalyurek "Scheduling of Streaming Workflows"
Task Clustering Umit Catalyurek "Scheduling of Streaming Workflows"
Task Replication • Throughput = 0.1 • Replication for • Improve computation throughput • Improve communication throughput 10 T1 T1 18 T2 T3 T3 T2 12 8 15 T4 T4 Umit Catalyurek "Scheduling of Streaming Workflows"
Task Duplication Sample application DAG (a) Schedule without duplication (b) Schedule with duplication Umit Catalyurek "Scheduling of Streaming Workflows"
Duplication based Scheduling of Streaming Workflows • In the context of streaming workflows, • duplication can be used to avoid bottleneck data transfers without compromising task parallelism • Minimize workflow latency Let T=0.1 and P=4 5 T1 T1’ 15 15 5 T2 T3 5 8 8 T4 10 Umit Catalyurek "Scheduling of Streaming Workflows"
Duplication-based Scheduling of Streaming Workflows • However, • Duplication can require more processors due to redundant computation • Depends on weight of duplicated task and throughput constraint • Extra communication to broadcast input data to duplicates • May increase latency too! • Selectively duplicate ancestors • Duplication is done only if • There are available processors • It proves beneficial in terms of latency • It does not involve expensive communications that violate throughput requirement Umit Catalyurek "Scheduling of Streaming Workflows"
Estimating Throughput and Latency • Execution Model • Realistic k-port communication model • Communication & computation overlaps • Throughput Estimate = min (CompRate, CommRate) • Computation Rate (CompRate) Estimate: • Min #Procs(Ci) / exec_time(Ci) for all Ci’s • Communication Rate (CommRate) Estimate: • Greedy priority based scheduling of communication to channels & ports • Min #ParallelTransfers (trj)/ min_cycle_time (trj) for all trj • Latency Estimate • Takes into account both communication and computation dependencies Umit Catalyurek "Scheduling of Streaming Workflows"
An Example 8 • P = 4, Throughput constraint T = 0.1 • Satisfying the throughput • nr(T1) = 0.8, nr(T2)=1, nr(T3)=0.4, nr(T4)=0.5, nr(T5)=0.4, nr(T6)=0.2 • Expensive communications : eT1T2, eT3T4, eT3T5 • Cluster T1 and T2 • Duplicate T3 • Limiting the number of processors • Pused = 5 • Two options to reduce Pused • Merging T1, T2 and T6 • Merging T3’, T5 and T6 • Merge T3’, T5 and T6 -> reduces latency • Minimizing latency • Nothing to be done! T1 12 6 6 10 T2 T3 4 T3’ 4 11 12 5 T4 T5 4 4 3 8 T6 2 Umit Catalyurek "Scheduling of Streaming Workflows"
The Pipelined Schedule 8 Throughput = 0.1, Latency = 28 T1 6 6 10 T2 T3 4 T3’ 4 T3’ (1) T5 (1) T6 (1) T3’ (2) T5 (2) T6 (2) T3’ (3) T5 (3) T6 (3) 5 T4 T5 4 4 T3 (1) T4 (1) T3 (2) T4 (2) T3 (3) T4 (3) 3 Processors P1 P2 P3 P4 T1(2) T2(2) T1(4) T2(4) T6 T1(1) T2(1) T1(3) T2(3) 2 8 18 28 38 48 10 14 23 33 Time Umit Catalyurek "Scheduling of Streaming Workflows"
Performance on Synthetic Benchmarks (a) CCR = 0.1 (b) CCR = 1 • As CCR is increased, more instances where FCP and EXPERT do not meet throughput requirement • Proposed approach always meet throughput requirement and produces lower latencies (c) CCR = 10 Umit Catalyurek "Scheduling of Streaming Workflows"
Benefit of Task Duplication (a) CCR = 1 (b) CCR = 10 • As throughput constraint relaxed, greater benefit observed (more processors for duplication) • For negligible throughput constraint, clustering doesn’t have much adverse impact on latency Umit Catalyurek "Scheduling of Streaming Workflows"
Performance on Applications (a) (b) Performance of MPEG Video Compression on 32 processors, (a) Latency Ratio and (b) Utilization Ratio • MPEG frames are processed in order of arrival – no replication • Throughput constraint assumed to be reciprocal of weight of largest task • Proposed approach yields similar latency as FCP, but has lower resource utilization • Proposed approach generates lower latency than EXPERT Umit Catalyurek "Scheduling of Streaming Workflows"
Throughput Optimization under Latency Constraint • Relation between throughput and latency • Monotonically increasing • Binary search algorithm on the inverse problem • L – latency required • If L >= L_max, output T_max • If L_min < L < L_max, do binary search (T=T_max/2….) • However, as we use heuristics, the monotonic relation is not guaranteed • We use look-ahead techniques to avoid local optima (L_min, ~0) (L_max, T_max) Umit Catalyurek "Scheduling of Streaming Workflows"
Throughput Optimization under Latency Constraint • Proposed approach generates schedules with larger throughputs that meet the latency constraints • Meets latency constraints even when other schemes fail (a) CCR = 0.1 (b) CCR = 1 Umit Catalyurek "Scheduling of Streaming Workflows" 28
Related Work • Bag-of-Tasks applications • H. Casanova, D. Zagorodnov, F. Berman, and A. Legrand. Heuristics for scheduling parameter sweep applications in grid environments. HCW’00. • Arnaud Giersch, Yves Robert, and Frédéric Vivien. Scheduling tasks sharing files on heterogeneous master-slave platforms. Journal of Systems Architecture, 2006. • K Kaya and C Aykanat. Iterative-improvement-based heuristics for adaptive scheduling of tasks sharing files on heterogeneous master-slave environments. IEEE TPDS, 2006. • Non-streaming workflows • S Ramaswamy, S Sapatnekar, and P Banerjee. A framework for exploiting task and data parallelism on distributed memory multicomputers. IEEE TPDS 1997. • A. Radulescu and A. van Gemund. A low-cost approach towards mixed task and data parallel scheduling. ICPP, 2001. • A Radulescu, C Nicolescu, A J. C. van Gemund, and P Jonker. Cpr: Mixed task and data parallel scheduling for distributed systems. IPDPS, 2001. • K. Aida and H. Casanova. Scheduling Mixed-Parallel Applications with Advance Reservations. HPDC, 2008. • Streaming workflows • F. Guirado, A.Ripoll, C. Roig, and E. Luque. Optimizing latency under throughput requirements for streaming applications on cluster execution. Cluster Computing, 2005. • Matthew Spencer, Renato Ferreira, Michael Beynon, Tahsin Kurc, Umit Catalyurek, Alan Sussman, and Joel Saltz. Executing multiple pipelined data analysis operations in the grid. SC, 2002 • Anne Benoit and Yves Robert. Mapping pipeline skeletons onto heterogeneous platforms. Technical Report LIP RR-2006-40, 2006. • Anne Benoit and Yves Robert. Complexity results for throughput and latency optimization of replicated and data-parallel workflows. Technical Report LIP RR-2007-12, 2007. • Anne Benoit, Harald Kosch, Veronika Rehn-Sonigo and Yves Robert. Optimizing latency and reliability of pipeline workflow applications, Technical Report LIP RR-2008-12, 2008. Umit Catalyurek "Scheduling of Streaming Workflows"
Conclusions & Future Work • Streaming Workflows • Co-ordinated use of task-, data- and pipelined-parallelism • Multiple performance objectives (latency and throughput) • Consistently meets throughput requirements • Lower latency schedules using fewer resources • Larger throughput schedules while meeting latency requirements • Future Work • Scheduling for multi-core clusters • Deeper memory hierarchies • Power-aware approaches • Fault-tolerant approaches Umit Catalyurek "Scheduling of Streaming Workflows"
Thanks • Questions? • Contact Info: Umit Catalyurek • umit@bmi.osu.edu • http://bmi.osu.edu/~umit • OSU Dept. of Biomedical Informatics: http://bmi.osu.edu Umit Catalyurek "Scheduling of Streaming Workflows"