210 likes | 321 Views
Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam. Outline. Introduction List Scheduling Preliminaries General Framework for LSSP Complexity Analysis Case Study Extensions for LSDP Conclusion.
E N D
Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented byBahadır Kaan Özütam of 21
Outline • Introduction • List Scheduling • Preliminaries • General Framework for LSSP • Complexity Analysis • Case Study • Extensions for LSDP • Conclusion of 21
Introduction • Task Scheduling • Scheduling heuristics • Shared-memory - Distributed Memory • Bounded - unbounded number of processors • Multistep - singlestep methods • Duplicating - nonduplicating methods • Static - dynamic priorities of 21
List Scheduling • LDSP and LSSP algorithms • LSSP (List Scheduling with Static Priorities); • Tasks are scheduled in the order of their previously computed priorities on the task’s “best” processor. • Best processor is ... • The processor enabling the earliest start time, if the performance is the main concern • The processor becoming idle the earliest, if the speed is the main concern. • LSDP (List Scheduling with Dynamic Priorities); • Priorities for task-processor pairs • more complex of 21
List Scheduling • Reducing LSSP time complexity • O(V log(V) + (E+V)P) => O(V log (P) + E) V = expected number of tasks E = expected number of dependencies P = number of processors 1. Considering only two processors 2. Maintaining partially-sorted task priority queue with a limited number of tasks of 21
V E E V V V E E E E E V V V E E E V Preliminaries • Parallel programs • (DAG) G = (V,E) • Computation cost Tw(t) • Communication cost Tc(t, t’) • Communication and computation ratio (CCR) • The task graph width (W) of 21
Preliminaries • Entry and exit tasks • The bottom level (Tb) of the task • Ready = parents scheduled • Start time Ts(t) • Finish time Tf(t) • Partial schedule • Processor ready time Tr(p) = max Tf(t) , t V, Pr(t)=p. • Processor becoming idle the earliest (pr) Tr(pr) = min Tr(p) , p P of 21
Preliminaries • The last message arrival time Tm(t) = max { Tf(t’) + Tc(t’, t) } (t’, t) E • The enabling processor pe(t); from which last message arrives • Effective message arrival time Te(t,p) = max { Tf(t’) + Tc(t’, t) } (t’, t) E , pt(t’) <> p • The start time of a ready task, once scheduled Ts(t, p) = max { Te(t, p), Tr(p) } of 21
General Framework for LSSP • General LSSP algorithm • Task’s priority computation, • O(E + V) • Task selection, • O(V log W) • Processor selection • O( (E + V) P) of 21
General Framework for LSSP • Processor Selection • selecting a processor 1. The enabling processor 2. Processor becoming idle first Ts(t) = max { Te (t, p), Tr ( p ) } of 21
General Framework for LSSP • Lemma 1. p <> pe(t) : Te (t, p) = Tm(t) • Theorem 1. t is a ready task, one of the processors p {pe(t), pr } satisfies Ts (t, p) = min Ts(t, px), px P • O( (E + V) P ) O (V log (P) + E ) • O (E + V) to traverse the task graph • O (V log P) to maintain the processors sorted of 21
General Framework for LSSP • Task Selection • O (V log W) can be reduced by sorting only some of the tasks. • Task priority queue 1. A sorted list of size H 2. A FIFO list ( O ( 1 ) ) • decreases to O(V log H) • H needs to be adjusted • H = P is optimal ( O ( V log P ) ) of 21
Complexity Analysis • Computing task priorities O ( E + V ) • Task selection O ( V log W ) O ( V log H ) for partially sorted priority queue O ( V log (P) ) for queue of size P • Processor Selection O (E + V) O (V log P) • Total complexity O ( V ( log (W) + log (P) ) + E) fully sorted O ( V ( log (P) + E ) partially sorted of 21
t0 / 2 1 1 4 t1 / 2 t2 / 2 t3 / 2 1 4 1 3 2 t4 / 3 t5 / 3 t6 / 2 2 1 3 t7 / 2 Case Study • MCP (Modified Critical Path) • The task having the highest bottom level has the highest priority • FCP (Fast Critical Path) • 3 Processors • Partially sorted priority queue of size 2 • 7 tasks of 21
t0 / 2 1 1 4 t1 / 2 t2 / 2 t3 / 2 1 4 1 3 2 t4 / 3 t5 / 3 t6 / 2 2 1 3 t7 / 2 Case Study of 21
Extensions for LSDP • Extend the approach to dynamic priorities ETF : ready task starts the earliest ERT : ready task finishes the earliest DLS : task-processor having highest dynamic level • General formula (t, p) = ( t ) + max { Te (T, p), Tr (p) } • ETF ( t ) = 0 • ERT ( t ) = Tw( t ) • DLS ( t ) = - Tb(t) of 21
Extensions for LSDP • EP case • on each processor, the tasks are sorted • the processors are sorted • non-EP case • the processor becoming idle first • if this is EP, it falls to the EP case of 21
Extensions for LSDP • 3 tries; • 1 for EP case, 1 for non-EP case • Task priority queues maintained; • P for EP case, 2 for non-EP case • Each task is added to 3 queues; • 1 for EP case, 2 for non-EP case • Processor queues; • 1 for EP case, 1 for non-EP case of 21
Complexity • Originally O ( W ( E + V ) P ) now O ( V (log (W) + log (P) ) + E ) can be further reduced using partially sorted priority queue. A size of P is required to maintain comparable performance O ( V log (P) + E ) of 21
Conclusion • LSSP can be performed at a significantly lower cost... • Processor selection between only two processors; enabling processor or processor becoming idle first • Task selection, only a limited number of tasks are sorted • Using the extension of this method, LSDP complexity also can be reduced • For large program and processor dimensions, superior cost-performance trade-off. of 21
Thank You Questions? of 21