Power Reduction Techniques for Microprocessor Systems by Timothy Goldberg

Power Reduction Techniques for Microprocessor Systems by Timothy Goldberg Paper by: Vasanth Venkatachalam and Michael Franz Published 2005

Power Consumption and its Importance • Saving Power • Save money, save electricity, save the planet • Heat Dissipation • Heat density and cooling • Battery Life • Use less energy, extend battery running time

Outline • Definition of Power and Energy • Power Reduction Techniques • From the Circuit level through Hardware to Compiler and Application level techniques • Commercial Systems • Emerging Technologies

Power and Energy • Need to reduce both • Power = Work / Time • Affects heat • Energy = Power * Time • Affects battery • Dynamic Power Consumption: Circuit activity • Switched capacitance (depends on V, f, C, a) • Clock gating • Short-circuit current, transistors with opposite charges (10-15% of total power)

Power and Energy • Leakage Power Consumption: Static/Idle power • Depends on Voltage and Leakage Current • Sub-threshold leakage: supply voltage, threshold voltage, temperature. • Reduce Voltage, Fewer transistors, increase Threshold voltage

Power Reduction • From low level circuit changes • Low-Power Interconnect • Memories and Memory Hierarchies • Hardware/Architecture • Dynamic Voltage Scaling • Resource Hibernation • Compiler • Application • Cross-layer

Circuit and Logic Level Techniques • Transistor Sizing: Reduce width • less dynamic power consumption, but increases delay • Transistor Reordering: Minimize switching activity • place frequently switching transistors closer to the circuit's outputs • Logic Gate Restructuring: Reduce switching • Gates must receive inputs at the same time

Circuit and Logic Level Techniques • Technology Mapping: Software tools • Find best configuration, based on restraints • Design circuit out of logic gates to minimize total power consumption • NP-Hard DAG problem • Low Power Flip-Flops: • Self-gating flip-flop: Reduce switching activity • Dual-edge triggered: Reduce power dissipated by clock signal

Circuit and Logic Level Techniques • Low Power Control: Processor as a FSM • Activate only the circuitry needed for current executing sub-FSM • Delay-Based Dynamic Supply Voltage • Look-up table of voltages and clock speeds has worst case • Adjust voltage based on the delay and monitor errors • Requires more hardware (shadow-latches)

Low-Power Interconnect • Bus Encoding: inversion to reduce switching • Crosstalk: activity in neighbor wires (shield wire) • Low Swing Buses: +300mV and -300mV instead of +5V and -5V • Immune to crosstalk, but increased hardware at encoder and decoder • Bus Segmentation: allows most of bus to remain powered down when not communicating

Low-Power Interconnect • Adiabatic Buses: Reuses existing charge • Reduce total capacitance • Delay in transferring charge • Network-On-Chip: • Functional units sharing buses: lack speed and volume of transfers • Generic Interconnection Networks replace buses • Concurrent connections

Low-Power Memories and Memory Hierarchies • Reduce power regardless of type (ROM/RAM) • Split Memories into smaller Sub-Systems: activate only the needed circuits in accesses • Specialized cache to reduce accesses • Before first cache level, store application's working set • Block Buffering – store most recently accessed cache set • Scratch Pad Memories – determined by compiler • Trace cache: store instructions in executed order • Dynamic direction prediction-based trace cache • Selective Trace Cache: compiler helps

Low-Power Processor Architecture Adaptations • Adaptive Caches: lines, blocks, or sets selectively activated based on miss threshold • Lost data and delay with No Voltage • Cache Decay turns off unused cache lines after interval • Hot Spot Detection: count branch taken, activate cache lines within hotspot • Dead Block: powers down cache lines containing basic blocks that have reached final use (compiler-directed)

Architecture Adaptations • Adaptive Instruction Queues: partitions powered down when instructions aren't needed • Heuristics: measure IPC, with thresholds • Algorithms for reconfiguring Multiple Structures: • Adjust pipeline width and register update unit for hotspots • Tests configurations within hotspot • Offline Profiling • Occupancy-based • Selective Way Caches: measure cache hits in each way

Dynamic Voltage Scaling • Modulate clock frequency and supply voltage • Dynamic, depending on workload • Difficulties: • Unpredictable workloads (tasks and I/O requests, predicting run-time) • Indeterminism – how to decide how fast? • Running an application at slowest speed may not be best • Non-linear effect of frequency

Dynamic Voltage Scaling • Interval-Based approaches: measure how busy, and estimate future, workloads are not regular • Idling with a threshold, thrashing • Aged Averages, weighted intervals • Intertask Approaches: assign speeds for different tasks • Monitor hardware events • Frequency for tasks generated in offline mode, cannot be known perfectly beforehand • Unaware of program structure, such as memory access

Dynamic Voltage Scaling • Intratask Approaches: Adjust processor speed and voltage within tasks • Split a task into fixed length Time Slots • Slow down away from critical path, help from compiler • Memory Bounded Code: memory accesses limit how fast program can execute • Heuristics through experimentation • Cache miss counter • Stall cycle counter, PC marked as hot • Measure rate of instructions, compute-intensive

Dynamic Voltage Scaling • Multiple Clock Domain Architectures: • Globally Asynchronous Locally Synchronous chip: • Chip split into multiple domains with independent clock rates • Allows certain sections of CPU to scale down when not needed • Needs to be divided such that communication between domains doesn't waste more energy • Can scale voltage based on instruction issue queues

Resource Hibernation • Disk Drives: Stop rotating platter during idle • An acceptable threshold • Delay non-urgent requests in a queue • Dynamic RPM Drives for servers • Network Interfaces: can it be turned off? • Track idleness of devices, enter listening or sleep mode • Allows network card to remain idle before shutting down • Displays: Dim display with no input • Face-off to recognize a face in front of display • Zoned Backlighting: Adjust brightness of display regions

Compiler-Level Power Management • Code that reduces execution time • No fixed relationship between performance and power • Reduce memory accesses • Remote Compilation and Remote Execution • Server compiles and mobile device downloads • Cost of download must be less than compiling • Statically Optimized Compilers • Program's runtime behavior may differ from expected • Process will run on an unpredictable system

Compiler-Level Power Management • Dynamic Compilation: Program recompiled as runtime environment changes • Resources levels such as battery capacity and energy budgets • Trade-off of recompilation

Application-Level Power Management • Enable application to adapt to runtime environment • Trading off fidelity or quality of data to users • Lower QoS when resources are low • Interfaces to allow applications to provide hints • Allow application to communicate with OS, and OS with hardware • Expected execution of tasks, deadlines • Better DVS, power down disk for longer periods of time

Cross-Layer Adaptations • Forge: integrated power management framework • Streams videos at most efficient QoS level • Frequency and voltage scaling, network card interface • Grace: adaptation framework • Global and local adaptations • Compiler and Operating System interaction • Compiler has a worst-case deadline • OS adjusts processor speed to meet deadline

Conclusion of Techniques • Multifaceted effort from various disciplines • From transistors to applications, and across all layers • Still ongoing research, new algorithms and heuristics • Impossible to tell what new technologies will prove most successful

Commercial Systems • Pentium 4: high performance goal • Internal temperature cap • Intel Speedstep – 2 frequency and voltage settings • Pentium M: mobile performance and low power • Reduce switching activity in circuit, idle units and buses • Low leakage transistors in cache • Enhanced Speedstep with 6 frequency/voltage settings • Intel PXA27x: wireless handheld devices • Uses memory boundedness to manage power modes

Emerging Radical Technologies • Fuel Cells to replace batteries • Chemical reaction, but can supply energy indefinitely • Fuel enters anode, splits into proton + electron and generates charge • Fuel is abundantly available, such as hydrogen • Micro-electrical and Mechanical Systems • Convert mechanical to electrical energy • Millimeter scale turbine engines, ignite air with fuel • Produce hot exhaust gases and flammability

Power Reduction Techniques for Microprocessor Systems by Timothy Goldberg

Power Reduction Techniques for Microprocessor Systems by Timothy Goldberg

Presentation Transcript

Power Reduction Techniques For Microprocessor Systems

Microprocessor-based Systems

Low Voltage Power for Future Microprocessor

Microprocessor-based systems

Microprocessor-based Systems

Microprocessor-based Systems

Microprocessor-based systems

Reduction Techniques

Relaxation Techniques for Stress Reduction

Reduction Techniques

Power Reduction Techniques in Decimation Filter

Microprocessor-based Systems

Leakage Power Reduction Techniques

By Natasha Goldberg

Leakage reduction techniques

Leakage reduction techniques

Microprocessor-based systems

CSNB373: Microprocessor Systems

CSNB374: Microprocessor Systems

Microprocessor-based Systems

CSNB374: Microprocessor Systems

CSNB374: Microprocessor Systems