490 likes | 637 Views
Energy Conservation and Adaptation in Mobile and Embedded Systems. Faculty: Kang G. Shin Grad students: Babu Pillai Hai Huang Sabbatical leave from Real-Time Computing Laboratory University of Michigan http://www.eecs.umich.edu/~kgshin.
E N D
Energy Conservation and Adaptationin Mobile and Embedded Systems Faculty: Kang G. Shin Grad students: Babu Pillai Hai Huang Sabbatical leave from Real-Time Computing Laboratory University of Michigan http://www.eecs.umich.edu/~kgshin
My Current Thrust Areas • Wired and Wireless QoS Networking • Internet QoS with DiffServ, MPLS, overlay networks • Dependable real-time protocols • Wireless QoS and security protocols • Embedded Systems • Model-based integration of application SW • Modeling and integration of OS and network services • Energy-aware real-time OS and applications • Embedded systems networks • Internet Servers and Adaptware • Workload differentiation and protection • Overload detection and avoidance • DDoS attacks
Outline • Motivation • Energy-efficient real-time OSs • Real-time dynamic voltage scaling • Energy-aware QoS • Conclusions
Motivation • Handheld and mobile comp and comm devices are everywhere! • Increasingly complex SW and faster HW demand more energy • Rapid increases in HW complexity, speed, and power consumption, but battery technology is not keeping up • Need to conserve energy, improve computational efficiency via OS on power-constrained systems
Real-Time & Energy Constraints • Many power-constrained embedded or mobile systems have real-time tasks • Time-critical computations, typically periodic • Need to provide guarantees for meeting deadlines • Available stored energy fundamentally limits system runtime • Need to use energy efficiently, allocate to the most critical or desirable computations, while meeting real-time requirements
Real-Time Task Characteristics • Typically well-defined task set • Canonical real-time task, Ti: • Is periodic, with period Pi • Has worst-case execution time (WCET), Ci • Has relative deadline, di typically equal to Pi • Periodic model can accommodate aperiodic and sporadic tasks • Schedulability of RT systems well-studied
Energy-Efficient RTOS • Reduce overhead services => lower computational overhead =>lower CPU power consumption • Optimized IPC for periodic RT tasks • Combined Static Dynamic (CSD) scheduling • Protocol stack layer-bypassing • Eliminate naming services • Reported at ACM SOSP’99, NOSSDAV’00, USENIX’02 • Exploit HW mechanisms, e.g., voltage scaling of CPU, power management of memory subsystem
Memory Power Management • Goal: reduce power dissipation for memory access • Main memory consists of multiple devices, each with independently-controlled power states • Switch devices not needed for current task to low-power states • Modify page allocation to reduce number of devices in use by each task • 59-94% memory power reduction with RDRAM • Will present at USENIX’03
RT-DVS • Goal: reduce per-cycle CPU energy costs • Reducing frequency permits lower voltage • Lower voltage on CPU to obtain V2 savings per cycle • Frequency change affects execution time, disrupts RT scheduling • Have developed energy-conserving algs for DVS that preserve RT guarantees (ACM SOSP’01)
CPU Operating Voltage • Predominant device technology is CMOS • E V 2 • Maximum gate delays inversely related to voltage Can reduce unit computation energy by reducing frequency and voltage
Dynamic Voltage Scaling (DVS) • [Weiser+94] busy system increase frequency idle system reduce frequency • Many algorithms for non-RT applications • Software adjustable PLL, voltage regulator • often available, but intended for other things • XScale, SpeedStep, PowerNow!, Crusoe
Our Approach • Development of real-time DVS (RT-DVS) • DVS algorithms that maintain RT guarantees • Simple enough for online scheduling • Work closely with existing RT sched algs. • In-depth simulation • Implementation in a real, working system
Static Voltage Scaling • Earliest-Deadline-First (EDF) • Worst-case utilization: ui = Ci / Pi • Frequency selection: ui f / fmax 1.00 0.75 u3= 0.4 0.50 utilization u2= 0.2 f = 0.9*fmax 0.25 u1= 0.3 0.00
Simulation • Parameterized simulation • Synthetic random real-time task sets • uniform distribution of short (1-10), medium (10-100), long (100-1000ms) periods • Vary computation time distributions • Vary hardware specifications • Compare different scheduling algorithms and theoretical bounds.
Simulation Setup • Input: task set, system parameters, sched alg • Output: energy consumption of each alg • System parameters: • list of freqs and voltages • actual fraction of WCET for each task • idle level (idle vs. normal op energy consumed) • Theoretical lower bounds: • task execution thruput only w/o timing issues
Simulation Results • 8 tasks in a task set • 100 random task sets • Workload = total worst-case utilization • 3 freq./volt. settings: • 5V, 1.0*fmax • 4V, 0.75*fmax • 3V, 0.5*fmax
Dynamic Scaling If each task uses less than WCET, we can use lower frequency during its invocation interval u3= 0.2 f = 0.7*fmax 1.00 0.75 u3= 0.4 utilization 0.50 u2= 0.2 f = 0.9*fmax 0.25 u1= 0.3 0.00 time (k-1)P3 kP3
Task 1 Task 2 Task 3 Cycle-Conserving EDF • fdesired = fmax* ui • at Ti release set ui = Ci / Pi • at Ti finish set ui = actual execution time / Pi u1 = 0.3 u2 = 0.2 u3 = 0.4 U=0.9 U=0.8 U=0.7 U=0.7 1.00 U=0.7 U=0.6 0.75 C1/3 U=0.6 U=0.5 C3/2 f / fmax 0.50 C2/2 C1/3 0.25 0.00 time D1 D2 D3
Simulation Results • 8 tasks in a set • 70% WCET each invocation • 3 freq./volt. settings: • 5V, 1.0*fmax • 4V, 0.75*fmax • 3V, 0.5*fmax
Proactive Techniques • Tasks typically use much less than WCETs • Proactively reduce frequencies • Look ahead to meet future deadlines • Considerall tasks together
Task 1 Task 2 Task 3 D1 D2 D3 D1 D2 D3 time D1 D2 D3 time Look-Ahead EDF • Minimize current frequency • Trade current savings for potential future loss • Plan to defer work beyond next immediate deadline • Ensure future deadlines with “reservation” time D1
Simulation Results • 8 tasks in a set • 70% WCET each invocation • 3 freq./volt. settings: • 5V, 1.0*fmax • 4V, 0.75*fmax • 3V, 0.5*fmax
More Simulation Results • Number of tasks is not important • Voltage and frequency settings greatly affect performance • Look-ahead does not always perform best • Algorithms can perform close to theoretical lower bound
Implementation • PC notebook computer • AMD K6-2+ processor, 550 MHz • PowerNow! Technology: • frequency can be changed 200-550MHz in 50MHz increments • voltage selection 1.4V or 2.0V • empirical mapping between voltage and frequency • switching overheads: 0.4ms (voltage), 41 micros (freq) • Processing 20W, screen backlight 7W
RT Task Set Software Architecture • Real-time extension to Linux 2.2 • Modular design, plug-in schedulers PowerNow module Real-Time module Scheduler module Linux 2.2 Kernel Hardware
Measurements • Use oscilloscope with current probe • Synthetic RT workload w/ backlighting off • Similar to simulation results (20-40% savings)
Interesting Observations • The very first invocation of a task may overrun its WCET due to ‘cold’ processor OS states • Dynamic addition of a task may cause transient deadline misses, especially with more aggressive schemes • Insert the task immediately, but release it after completion of current invocations of all existing tasks
Related work Most are loosely-coupled with OS and based on avg processor utilization • Many non-RT DVS papers (esp., UCB) • Offline WCET analysis + online heuristic • [Krishna+00], [Swaminathan+00] • Computation time probability heuristics • [Gruian01] • Compiler-based, application-level DVS • [Mosse+00]
RT-DVS Discussion • Designed and evaluated 5 DVS algorithms for real-time systems (ACM SOSP’01) • Provide deadline guarantees while scaling frequency and voltage • Simple enough to be used as online schedulers • Excellent energy savings, comparable to non-RT DVS • Implemented in real system on top of Linux • Code available at: http://kabru.eecs.umich.edu/rtos/ • But, we need a larger energy framework
Limits of Low-level Techniques • DVS & processor halt conserve energy only when extra capacity available • No general guidelines on how to make best use of limited energy • Cannot provide more energy and runtime to more critical or valuable tasks • Need to adapt app workload to maximize system gains or utility of computation
Example • A remote surveillance device transmits compressed video and audio • Solar-powered, but must run overnight • 3 real-time tasks: • Radio transmitter (critical): constant bit rate • Video codec (degradable): -high quality (30 fps, 640x480) MPEG4 -low quality (10 fps, 160x120) MPEG1 • Audio codec (noncritical): mp3, either ON or OFF
Example, cont’d • Adapt task set based on power consumption of tasks, available energy, hours until daylight, and relative value of the tasks, e.g., • During daytime or high battery levels: radio, video at high quality, audio ON • Low battery at night: radio, video low quality, audio ON • Energy is critically low: radio, video low quality, audio OFF • Dynamic adaptation needed in general, as battery levels and time until daylight are variable
EQoS • Need to maximize benefits gained from energy spent, but HOW? => Energy-aware Quality of Service (EQoS): • Vary per-task QoS, which directly affects the task’s utility and energy consumption • Select a set of task QoS levels to maximize total system utility over given runtime • Cast selection into tractable, optimization problem
EQoS Design • EQoS design goals: • Leverage low-level halt, DVS mechanisms • Meet system runtime goal • Maximize benefits of task execution • Need methods of changing task QoS levels and specifying benefits and energy requirements
How to change per-task QoS? • Adopt RT fault-tolerance techniques: • Period extension • Imprecise computation • Apply different algorithms or CODECs • Omission • Degraded service requires less energy • For EQoS, need to specify set of QoS levels and their average required energy for each task
Utility • Abstract notion of value from executing tasks • Need to specify utility for each QoS level of a task: • Increasing Rewards for Increasing Service (IRIS) • Performance Index (PI) for control applications • Perceived quality metrics for multimedia • Actual specification flexible to types of applications and systems designed
EQoS Algorithms • Given: • tasks with QoS levels defined, energy required, and utility gained for each level • remaining system energy • desired runtime, or known time until recharge • Select a QoS level for each task to: • achieve desired runtime • maximize total utility • This can be formulated as a MCKP • Each task as a category, and the set of its QoS levels as items in the category • Knapsack size = power budget • Item values and weights = utility rates and power consumption
... Task Category Value Weight Energy Energy ... Task Category Value Weight Energy Energy ... Energy Task Category Runtime Constraint Value Weight Energy Energy MCKP vs. EQoS Problem EQoS MC Knapsack Problem Item 1 Value Weight Item 2 Value Weight ... Weight Limit Item n Value Weight
Optimal Algorithms • NP-hard: all KP can be expressed as MCKP • Exponential Search - O(mn) • Branch-and-Bound (BB) • Need fast bound computation • Can use LMCKP as upper bound • May still require exponential time ...
Optimal Algorithms, cont’d • Dynamic Programming (DP) • Pseudo-polynomial time, O(mnk) • Partial solutions for 1, 2, …, n tasks for all possible power budgets (energy/runtime)
Heuristics • Linear: • Use LMCKP solution, as with BB bound • Drop fractional part • Greedy: • Start with same approach as LMCKP • Continue selecting smaller upgrades • O(nm) overhead, without upgrades sorting
Simulation • Permits exploring a large space multi-dimensional task set space • Simulate various hardware configurations, RT scheduling, DVS mechanisms • Static RM, Static EDF, ccRM, ccEDF, laEDF • Generated 1000 random task sets, each with 10 tasks, and each of which has up to 5 QoS levels • QoS degradation models period extension, imprecise computation, algorithmic change
Simulation Results • EQoS algorithms achieve desired runtime • DVS conserves extra energy, throws off estimated runtime
Simulation Results - DVS • DVS increases energy efficiency • Throws off adaptation -- extends runtime • 3 volt/freq: • 5V, 1.0*fmax • 4V, .75*fmax • 3V, .35*fmax
Simulation Results, cont’d • DVS compensation achieves desired runtime with higher utility Utility comparison between DVS compensation and w/o compensation Adaptation w/ DVS Compensation
Implementation • Implemented on Linux 2.2 • Periodic real-time support • PowerNow! driver • Real-time scheduler modules • EQoS adaptation module • Battery monitoring module • Currently supports Athlon, Duron, K6-2 processors that implement AMD’s PowerNow! Technology
Experiments • Measurements on a Compaq Presario 1200Z • Implement RT version of Lame MP3 encoder • use quality parameter to vary QoS • multiple concurrent instances • Results follow trend observed in simulations
Conclusions • RT-DVS provides low-level CPU voltage control • Maintains timing guarantees for RT tasks • Significant energy savings, comparable to non-RT DVS • EQoS provides task adaptation in energy-constrained real-time systems • Provides guidelines to best utilize available energy among tasks • Frame energy adaptation as a tractable problem • Heuristics work nearly as well as optimal algorithms in practice