260 likes | 418 Views
Power Reduction Techniques for Microprocessor Systems by Timothy Goldberg Paper by: Vasanth Venkatachalam and Michael Franz Published 2005. Power Consumption and its Importance. Saving Power Save money, save electricity, save the planet Heat Dissipation Heat density and cooling
E N D
Power Reduction Techniques for Microprocessor Systems by Timothy Goldberg Paper by: Vasanth Venkatachalam and Michael Franz Published 2005
Power Consumption and its Importance • Saving Power • Save money, save electricity, save the planet • Heat Dissipation • Heat density and cooling • Battery Life • Use less energy, extend battery running time
Outline • Definition of Power and Energy • Power Reduction Techniques • From the Circuit level through Hardware to Compiler and Application level techniques • Commercial Systems • Emerging Technologies
Power and Energy • Need to reduce both • Power = Work / Time • Affects heat • Energy = Power * Time • Affects battery • Dynamic Power Consumption: Circuit activity • Switched capacitance (depends on V, f, C, a) • Clock gating • Short-circuit current, transistors with opposite charges (10-15% of total power)
Power and Energy • Leakage Power Consumption: Static/Idle power • Depends on Voltage and Leakage Current • Sub-threshold leakage: supply voltage, threshold voltage, temperature. • Reduce Voltage, Fewer transistors, increase Threshold voltage
Power Reduction • From low level circuit changes • Low-Power Interconnect • Memories and Memory Hierarchies • Hardware/Architecture • Dynamic Voltage Scaling • Resource Hibernation • Compiler • Application • Cross-layer
Circuit and Logic Level Techniques • Transistor Sizing: Reduce width • less dynamic power consumption, but increases delay • Transistor Reordering: Minimize switching activity • place frequently switching transistors closer to the circuit's outputs • Logic Gate Restructuring: Reduce switching • Gates must receive inputs at the same time
Circuit and Logic Level Techniques • Technology Mapping: Software tools • Find best configuration, based on restraints • Design circuit out of logic gates to minimize total power consumption • NP-Hard DAG problem • Low Power Flip-Flops: • Self-gating flip-flop: Reduce switching activity • Dual-edge triggered: Reduce power dissipated by clock signal
Circuit and Logic Level Techniques • Low Power Control: Processor as a FSM • Activate only the circuitry needed for current executing sub-FSM • Delay-Based Dynamic Supply Voltage • Look-up table of voltages and clock speeds has worst case • Adjust voltage based on the delay and monitor errors • Requires more hardware (shadow-latches)
Low-Power Interconnect • Bus Encoding: inversion to reduce switching • Crosstalk: activity in neighbor wires (shield wire) • Low Swing Buses: +300mV and -300mV instead of +5V and -5V • Immune to crosstalk, but increased hardware at encoder and decoder • Bus Segmentation: allows most of bus to remain powered down when not communicating
Low-Power Interconnect • Adiabatic Buses: Reuses existing charge • Reduce total capacitance • Delay in transferring charge • Network-On-Chip: • Functional units sharing buses: lack speed and volume of transfers • Generic Interconnection Networks replace buses • Concurrent connections
Low-Power Memories and Memory Hierarchies • Reduce power regardless of type (ROM/RAM) • Split Memories into smaller Sub-Systems: activate only the needed circuits in accesses • Specialized cache to reduce accesses • Before first cache level, store application's working set • Block Buffering – store most recently accessed cache set • Scratch Pad Memories – determined by compiler • Trace cache: store instructions in executed order • Dynamic direction prediction-based trace cache • Selective Trace Cache: compiler helps
Low-Power Processor Architecture Adaptations • Adaptive Caches: lines, blocks, or sets selectively activated based on miss threshold • Lost data and delay with No Voltage • Cache Decay turns off unused cache lines after interval • Hot Spot Detection: count branch taken, activate cache lines within hotspot • Dead Block: powers down cache lines containing basic blocks that have reached final use (compiler-directed)
Architecture Adaptations • Adaptive Instruction Queues: partitions powered down when instructions aren't needed • Heuristics: measure IPC, with thresholds • Algorithms for reconfiguring Multiple Structures: • Adjust pipeline width and register update unit for hotspots • Tests configurations within hotspot • Offline Profiling • Occupancy-based • Selective Way Caches: measure cache hits in each way
Dynamic Voltage Scaling • Modulate clock frequency and supply voltage • Dynamic, depending on workload • Difficulties: • Unpredictable workloads (tasks and I/O requests, predicting run-time) • Indeterminism – how to decide how fast? • Running an application at slowest speed may not be best • Non-linear effect of frequency
Dynamic Voltage Scaling • Interval-Based approaches: measure how busy, and estimate future, workloads are not regular • Idling with a threshold, thrashing • Aged Averages, weighted intervals • Intertask Approaches: assign speeds for different tasks • Monitor hardware events • Frequency for tasks generated in offline mode, cannot be known perfectly beforehand • Unaware of program structure, such as memory access
Dynamic Voltage Scaling • Intratask Approaches: Adjust processor speed and voltage within tasks • Split a task into fixed length Time Slots • Slow down away from critical path, help from compiler • Memory Bounded Code: memory accesses limit how fast program can execute • Heuristics through experimentation • Cache miss counter • Stall cycle counter, PC marked as hot • Measure rate of instructions, compute-intensive
Dynamic Voltage Scaling • Multiple Clock Domain Architectures: • Globally Asynchronous Locally Synchronous chip: • Chip split into multiple domains with independent clock rates • Allows certain sections of CPU to scale down when not needed • Needs to be divided such that communication between domains doesn't waste more energy • Can scale voltage based on instruction issue queues
Resource Hibernation • Disk Drives: Stop rotating platter during idle • An acceptable threshold • Delay non-urgent requests in a queue • Dynamic RPM Drives for servers • Network Interfaces: can it be turned off? • Track idleness of devices, enter listening or sleep mode • Allows network card to remain idle before shutting down • Displays: Dim display with no input • Face-off to recognize a face in front of display • Zoned Backlighting: Adjust brightness of display regions
Compiler-Level Power Management • Code that reduces execution time • No fixed relationship between performance and power • Reduce memory accesses • Remote Compilation and Remote Execution • Server compiles and mobile device downloads • Cost of download must be less than compiling • Statically Optimized Compilers • Program's runtime behavior may differ from expected • Process will run on an unpredictable system
Compiler-Level Power Management • Dynamic Compilation: Program recompiled as runtime environment changes • Resources levels such as battery capacity and energy budgets • Trade-off of recompilation
Application-Level Power Management • Enable application to adapt to runtime environment • Trading off fidelity or quality of data to users • Lower QoS when resources are low • Interfaces to allow applications to provide hints • Allow application to communicate with OS, and OS with hardware • Expected execution of tasks, deadlines • Better DVS, power down disk for longer periods of time
Cross-Layer Adaptations • Forge: integrated power management framework • Streams videos at most efficient QoS level • Frequency and voltage scaling, network card interface • Grace: adaptation framework • Global and local adaptations • Compiler and Operating System interaction • Compiler has a worst-case deadline • OS adjusts processor speed to meet deadline
Conclusion of Techniques • Multifaceted effort from various disciplines • From transistors to applications, and across all layers • Still ongoing research, new algorithms and heuristics • Impossible to tell what new technologies will prove most successful
Commercial Systems • Pentium 4: high performance goal • Internal temperature cap • Intel Speedstep – 2 frequency and voltage settings • Pentium M: mobile performance and low power • Reduce switching activity in circuit, idle units and buses • Low leakage transistors in cache • Enhanced Speedstep with 6 frequency/voltage settings • Intel PXA27x: wireless handheld devices • Uses memory boundedness to manage power modes
Emerging Radical Technologies • Fuel Cells to replace batteries • Chemical reaction, but can supply energy indefinitely • Fuel enters anode, splits into proton + electron and generates charge • Fuel is abundantly available, such as hydrogen • Micro-electrical and Mechanical Systems • Convert mechanical to electrical energy • Millimeter scale turbine engines, ignite air with fuel • Produce hot exhaust gases and flammability