Power Aware Scheduling for AND/OR Graphs in Multi-Processor Real-Time Systems

Power Aware Scheduling forAND/OR Graphs in Multi-Processor Real-Time Systems Dakai Zhu, Nevine AbouGhazaleh, Daniel Mossé and Rami Melhem PARTS Group Computer Science Department University of Pittsburgh Presenter: Dakai Zhu http://www.cs.pitt.edu/PARTS

Motivation • Complex satellite and surveillance systems • Real-time processing • Limited energy • Multi-Processor (homogenous/heterogeneous) • Application: automated target recognition (ATR) • Workload varies on different paths • Traditional AND-model not enough  AND/OR model • Very few works on power management for multiprocessors • Static: Gruian-2000 • Dynamic: Yang-2001, Zhu-2001 http://www.cs.pitt.edu/PARTS

Power Management • Why & What: Power Management? • Battery operated: Laptop, PDA and Cell phone • Heating : complex servers (multiprocessors) • Power Aware: maintainQoS, reduce energy • How? • Power off un-used parts: LCD, disk for Laptop • Gracefully reduce the performance (CPU only) • Dynamic power Pd = Cef*Vdd2*f [Chandrakasan92, Burd96] • Cef : switch capacitance • Vdd : supply voltage • f : processor frequency  linear related to Vdd http://www.cs.pitt.edu/PARTS

Energy f 0.6E T1 T2 T2 T1 time 0.6E T1 T2 time fmax/2 E/4 T1 T2 time Power Aware Scheduling fmax Static Slack D E • Static Power Management (SPM) • Static slack: uniformly slow all tasks [Weiser-1994, Yao-1995, Gruian-2000] T1 T2 idle time Uniprocessors http://www.cs.pitt.edu/PARTS

Dynamic Slack fmax/2 T1 time fmax/3 0.12E T1 T2 time Power Aware Scheduling (cont) fmax Static Slack D E • Dynamic Power Management (DPM) in uniprocessors • Dynamic slack: non-worst execution 10% [Ernst-1994] • DPM: [Krishna-2000, Kumar-2000, Pillai-2001, Shin-2001] T1 T2 idle time E/4 fmax/2 T1 T2 time • Multi-Processors • SPM: length of schedule over deadline • DPM ??? Power Aware Scheduling http://www.cs.pitt.edu/PARTS

Outline • Motivation and Background • AND/OR Model Application • Greedy Algorithmsand Slack Stealing • SpeculativeAlgorithms • Evaluation and Analysis • Conclusion http://www.cs.pitt.edu/PARTS

T1 (1,2/3) 40% 60% (2,1) (1,1) (4,2) (3,2) T2 T3 T4 T5 (1,1) T6 (1,1) The sample application T7 Application: AND/OR Model • Real-Time Application • Set of tasks • Single Deadline • Directed Acyclic Graph (DAG) • Comp. (ci, ai) • AND (0,0) • OR (0,0): probabilities Ti http://www.cs.pitt.edu/PARTS

SM(Q, etc) P P DVS-CPU DVS-CPU P P Problem Statement • System Model • Multi-Processor (DVS) • Shared Memory • Scheduling Algorithm? • DVS reduce energy • Timing requirement • PartitionorGlobal? http://www.cs.pitt.edu/PARTS

` f T3 T2 T6 T7 T7 T1 T1 T5 T4 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 time time Shifting D f T3 L1 L1 T2 T6 T7 L0 T1 T5 T4 Slack Stealing • Shifting Static Schedule: 2-proc, D = 8 D L0 Recursive if embedded OR nodes http://www.cs.pitt.edu/PARTS

Proposed Algorithms • Greedy algorithm, two phases: • Off-line: longest task first heuristic; Slack stealing via shifting • Compute LSTi for Ti • On-line: • Same execution order • Claim the slack: LSTi – ti(tiLSTi) • Compute speed: http://www.cs.pitt.edu/PARTS

L1 D T3 T6 T7 0 1 2 3 T2 4 5 6 7 8 time T1 L0 Proposed Algorithms (cont) • Actual Running Trace: left branch,Ti use ai f • Possible Shortcomings • Too greedy: slow  fast • Number of speed change (overhead) Proposed Algorithms http://www.cs.pitt.edu/PARTS

Speculation: statistical information about Application • Static Speculation • All tasks • fi = max ( fss, fgi) • Adaptive Speculation • Remaining tasks • fi = max ( fas, fgi) Proposed Algorithms (cont) • Optimal for uniprocessor: Single speed • Energy – Speed: Concave • Minimal Energy when all tasks SAME speed Proposed Algorithms http://www.cs.pitt.edu/PARTS

Simulations • Schemes Considered • NPM: no power management (BASELINE), idle = 5%*Pmax • GSS: greedy slack stealing • SS: static speculation with greedy • AS: adaptive speculation with greedy • Parameters • Number of processors • Load: worst case time over deadline (global static slack L0) • Alpha: task average run time over WCET (dynamic slack) • Overhead: 5us/change • Processor Model • Transmeta: 16 levels 200Mhz (1.10V)– 700Mhz (1.65V) • Intel Xscale: 5 levels 150Mhz (0.75V)– 1 Ghz (1.80V) http://www.cs.pitt.edu/PARTS

Evaluation • ATR on 2-processors, alpha =0.95, Overhead = 5 us/change More Static Slack Less Static Slack Transmeta http://www.cs.pitt.edu/PARTS

Evaluation (cont.) • ATR on 2-processors, alpha =0.95, Overhead = 5 us/change More Static Slack Less Static Slack Intel Xscale http://www.cs.pitt.edu/PARTS

Evaluation (cont.) • Synthetic App. on 2-processors, load =0.8, Overhead = 5 us/change Transmeta More Dynamic Slack Less Dynamic Slack http://www.cs.pitt.edu/PARTS

Evaluation (cont.) • Synthetic App. on 2-processors, load =0.8, Overhead = 5 us/change More Dynamic Slack Less Dynamic Slack Intel Xscale http://www.cs.pitt.edu/PARTS

Conclusion and Contributions • Conclusions: • Significant energy saving with dynamic scheme. • Greedy is good • fmin prevents the Greedy to be dumb • Few speed levels reduces the probability of speed changes • Contributions: • Greedy Slack Stealing Algorithm • Speculation Algorithms  AND/OR http://www.cs.pitt.edu/PARTS

Questions ? http://www.cs.pitt.edu/PARTS

Power Aware Scheduling for AND/OR Graphs in Multi-Processor Real-Time Systems