320 likes | 336 Views
Explore a comprehensive discussion on implementing parallel systems, taskgraph scheduling techniques, inter-processor communication, and the importance of abstraction in scheduling static computations. Dive into scheduling algorithms, communication bottlenecks, and classic communication models.
E N D
Towards a Realistic Scheduling Model Oliver Sinnen, Leonel Sousa, Frode Eika Sandnes IEEE TPDS, Vol. 17, No. 3, pp. 263-275, 2006.
Parallel processing is the oldest discipline in computer science – yet the general problem is far from solved
Why is parallel processing difficult? • ”Jo flere kokker jo mere søl” • Partitioning and transforming problems • Load balancing • Inter-processor communication • Granularity • Architecture
Implementing parallel systems • Manually • MPI • PVM • Linda • Automatically • Parallelising compilers (Fortran) • Static scheduling
Modelling computations A=B+C Data dependencies C B A Valid sequences: CBA, BCA Invalid sequences: ABC, ACB, CAB, BAC
Another example A = (B-C)/D F = B+G D C B G A F
Static taskgraph scheduling techniques The scheduling process p1 p2 A A c1 c2 Allocation B B C C c4 c5 c3 time D D E E Taskgraph Schedule
Topological sorting • Topological sorting • to order the vertices of a graph such that the precedence constraints are not violated • All valid schedules represent a topological sort • Scheduling algorithms differ in how they topologically sort the graph
The importance of abstraction • Abstraction is important to preserve generality • Too specific float sum = 0; for (int i=0;i<8;i++) { sum += a[i]; } • General and flexible float sum = sumArray(a);
Communication is a major bottleneck • Typically from 1:50 to 1:10,000 difference between computation and communication • Communication cost not very dependent on data size. • Interconnection network topology affect the overall time.
Scheduling work prior to 1995 • Assumptions • Zero-interprocessor communication costs • Fully connected processor interconnection networks.
Data-size not is not major factor • Multiple single messages • Single compound message connect send connect send connect send connect send send send
The ring To send something from here.. …to here
Interprocessor communication • Zero vs non-zero communication overheads • Direct links vs connecting nodes P11 P12 P13 P14 P1 P2 P3 P4 P21 P22 P23 P24 Bus P31 P32 P33 P34 RAM P41 P42 P43 P44 Shared memory Bus-based multiprocessor Distributed memory Mesh multiprocessor
Duplication 1 a 1 a 1 a duplication 1 1 1 1 1 b 1 c 1 b 1 c allocation allocation p1 p2 p1 p2 a a a t=1 t=1 b b c t=2 t=2 t=3 c
Classic communication model: Assumptions • Local communications have zero communication costs • Communication is conducted by subsystem. • Communication can be performed concurrently • The network is fully connected
Implications • Network contention (not modelled) • Tasks compete for communication resources • Contention can be modelled: • Different types of edges • Switch verticies (in addition to processor verticies)
Processor involvement in communication I Two-sided involvement (TCP/IP PC-cluster)
Processor involvement in communication II One-sided involvement (Shared memory Cray T3E)
Processor involvement in communication III Third party involvement (Dedicated DMA hardware Meiko CS-2)
Problems • All classic scheduling models assume third-party involvement. • Very little hardware are equipped with dedicated hardware supporting third-party involvement. • Estimated finish-times for tasks are hugely inaccurate. • Scheduling algorithm are very sub-optimal.
Results 3TE-900 bobcat Sun E3500