220 likes | 402 Views
Dynamic Voltage Frequency Scaling for Multi-tasking Systems Using Online Learning. ISLPED 2007. Gaurav Dhiman Tajana Simunic Rosing Department of Computer Science and Engineering University of California, San Diego. Why Dynamic Voltage Frequency Scaling?.
E N D
Dynamic Voltage Frequency Scaling for Multi-tasking Systems Using Online Learning ISLPED 2007 Gaurav Dhiman Tajana Simunic Rosing Department of Computer Science and Engineering University of California, San Diego
Why Dynamic Voltage Frequency Scaling? • Power consumption is a critical issue in system design today • Mobile systems face battery life issues • High performance systems face heating issues • Dynamic Voltage Frequency Scaling (DVFS): • Dynamically scale the supply voltage level of CPU to provide “just enough” circuit speed to process the workload • An effective system level technique to reduce power consumption • Dynamic Power Management (DPM) is another popular system level technique. However focus of this work is on DVFS
Previous Work • Based on task level knowledge: • [Yao95],[Ishihara98],[Quan02] • Based on compiler/app. support: • [Azevedo02],[Hsu02],[Chung02] • Based on micro-architecture level support: • [Marculescu00],[Weissel02],[Choi04], [Choi05]
Workload Characterization and Voltage-Frequency Selection • No hard task deadlines in general purpose system. • Goal: Maximize energy savings while minimizing performance delay. • Key idea: • CPU-intensive tasks don’t benefit from scaling • Memory intensive tasks energy efficient at low v-f settings
Workload Characterization and Voltage-Frequency Selection (contd.) • Three tasks burn_loop (CPU-intensive), mem (memory intensive) and combo (mix) run with static scaling. • burn_loop energy efficient at all settings • mem energy efficient at lowest v-f setting
Measure CPU-intensiveness (µ) • CPI Stack CPIavg=CPIbase+CPIcache+CPItlb+CPIbranch+CPIstall • Use Performance Monitoring Unit (PMU) of PXA27x to estimate CPI stack components. • µ = CPIbase/CPIavg • High µ indicates high CPU-intensiveness and vice versa
Dynamic Task Characterization • Dynamically estimate µ for every scheduler quantum and feed it to the online learning algorithm. • The algorithm models the CPU-intensiveness of the task and accordingly selects the best suited v-f setting. • Theoretical guarantee on converging to the best v-f setting available.
Online Learning for Horse Racing Expert manages money for the race Experts Selects the best performing expert for investing his money Evaluates performance of all experts for that race
Online Learning for DVFS DVFS Experts (Working Set) ….. Selected expert applied to CPU for next scheduler quantum v-f setting 1 v-f setting 2 v-f setting n Selects the best performing expert CPU DVFS Controller Evaluates performance of all experts
Controller Algorithm Parameters: Initial weight vector for experts such that • Do for t = 1,2,3….. • Calculate µ. • Update weight vector of task: • wit+1 = wit . (1-(1-ß). lit • Choose expert with highest probability factor in : Sched. tick occurs • 4. Apply the v-f setting corresponding to the selected expert to the CPU. • 5. Reset and restart the PMU
µ 1.0 0 0.6 0.8 0.2 0.4 0.1 0.3 0.5 0.7 0.9 Expert1 µmean Expert2 µmean Expert3 µmean Expert4 µmean Expert5 µmean Evaluation of experts (loss calculation) • Intuition: Best suited frequency scales linearly with µ. • Map task characteristics to the best suited frequency using µ-mapper. Eg: Expert1-5={100,200,300,400,500}MHz • Evaluate experts against the best suited frequency.
What about Multi-tasking systems? • Possible for task with differing characteristics to execute together. • Weight vector (wt) characterizes an executing task. • Need to personalize this information at task level for accurate characterization. • Solution: store weight vector as a task level structure
Performance bound on Controller Performance of the scheme converges to that of best performing expert with successive sched ticks • If ltiis the loss incurred by expert i for the scheduler quantum t: = rt.lt • Goal to minimize net loss: LG–miniLi where,rt.ltand • Net loss bounded by • Average net loss per period decreases at the rate of Let N: experts in working set, T: total number of sched ticks
Implementation • Testbed • Intel PXA27x Development Platform • Linux 2.6.9 • Implemented as Loadable Kernel Module User Linux Kernel /proc file system Linux Process Manager Task Creation DVFS LKM Scheduler Tick vf setting PMU Intel PXA27x
Experiments • Setup • 1.25 samples/sec DAQ • Energy savings calculated using actual current measurements • Working set: 4 v-f setting experts • Workloads: • qsort • djpeg • blowfish • dgzip
Lower Perf Delay Higher energy savings Result: Frequency of Selection For qsort
Advantages of the scheme • Online learning algorithm: Provides theoretical guarantee on performance converging to that of the best performing expert. • Multi-Tasking systems: Works seamlessly across context switches. • User preference: Adapts energy savings/performance delay tradeoff with changes in user preference.
Overhead • Process Creation: used lat_proc from lmbench. • 0% overhead • Context Switch: used lat_ctx from lmbench • 3% overhead with 20 processes (max supported by lat_ctx) • [choi05] cause 100% overhead in context switch times • Extremely lightweight implementation.
Conclusion • Designed and implemented a DVFS technique for general purpose multi-tasking systems. • Based on online learning that provides theoretical guarantee on the convergence of overall performance to that of the best performing expert. • Provides user control over desired energy/performance tradeoff and is extremely lightweight.