370 likes | 644 Views
13th Oct. 2008. CSV881: Low Power Design. 2. Contents. Dynamic and Leakage Power consumptionLow power processor adaptations DVSOS level power reductionPower aware compilerApplication transformations for power reductionSome specific power reduction techniquesReferences. 13th Oct. 2008. CSV
E N D
1. 13th Oct. 2008 CSV881: Low Power Design 1 Low Power Processor Design: Part II
M. Balakrishnan
2. 13th Oct. 2008 CSV881: Low Power Design 2 Contents Dynamic and Leakage Power consumption
Low power processor adaptations
DVS
OS level power reduction
Power aware compiler
Application transformations for power reduction
Some specific power reduction techniques
References
3. 13th Oct. 2008 CSV881: Low Power Design 3 Dynamic Power Switched capacitance
Power consumed for charging and discharging the capacitance associated with all inputs, outputs as well as interconnects
85 to 90% of dynamic power
Short circuit current
Power consumed due to overlap in PMOS and NMOS on-time during switching
10-15% of dynamic power
4. 13th Oct. 2008 CSV881: Low Power Design 4 Switched Capacitance Power Consumption Switched capacitance
80 to 90% of dynamic power
Pdynamic ~ a C V2 f
Lower switched capacitance (low level design techniques mainly transistor sizes as well as connectivity at higher levels)
Lower switching activity (all levels; synthesis, clock gating)
Reduce clock frequency
Reduce supply voltage (DVS: effective as cubic relationship)
5. 13th Oct. 2008 CSV881: Low Power Design 5 Leakage Current
6. 13th Oct. 2008 CSV881: Low Power Design 6 Types Of Leakage Reverse-biased-junction leakage
Gate-induced-drain leakage
Sub-threshold leakage
Gate-oxide leakage
Gate-current leakage
Punch-through leakage
7. 13th Oct. 2008 CSV881: Low Power Design 7 Gate-oxide Leakage Flows from gate into the substrate
This leakage increases exponentially as thickness of the oxide decreases
Thickness is to reduce with other reduction in geometries and supply voltage
One solution is to use high-k dialectric material
8. 13th Oct. 2008 CSV881: Low Power Design 8 Sub-threshold Leakage Dominant leakage current
Isub = K1 W e(–Vth/nT) (1 – e(–V/T))
Isub increases exponentially as Vth decreases
Isub increases as temperature T increases (thermal runaway)
9. 13th Oct. 2008 CSV881: Low Power Design 9 Reduction of Sub-threshold Leakage Current Reduce supply voltage
Reduce size of the circuit
Resize transistors as per performance requirements
Dynamically cut power supply to unused circuits
Cooling
Reduce threshold voltage
Stack the off-transistors in series
Isolating supply through sleep transistors
Dual threshold; higher threshold on non-critical paths
Adaptive body biasing
10. 13th Oct. 2008 CSV881: Low Power Design 10 Low-Power Processor Adaptations Adaptive Cache
Deactivates cache sets depending on the current application characteristics
Drowsy cache; lines in unused portions of the cache placed in drowsy state
Power down of blocks which had their last use (compiler directed)
Adaptive instruction queues
IPC is monitored and Queue size adjusted
Reconfiguring multiple structures
Structures like instruction queue, reorder buffer, load/store buffer changed based on hotspots
11. 13th Oct. 2008 CSV881: Low Power Design 11 Dynamic Voltage Scaling DVS is the most widely used technique for reducing power consumption.
DVS reduces the voltage and slows down the processor frequency for workloads which have slack time available to execute
12. 13th Oct. 2008 CSV881: Low Power Design 12 Issues in DVS Unpredictable nature of workloads as well as “preemption” of tasks by interrupts creates further problems
Learning techniques create learning lags reducing its utility
Voltage to power relationship is not quadratic due to how I/O signals are derived (separate voltage) and also how the peripheral devices are being managed
Inter-task relationships if one processor or core is slowed down
13. 13th Oct. 2008 CSV881: Low Power Design 13 DVS Strategies Interval based approaches
Processor idle time in a window
Aged averages (weighted averages with lower weights to previous intervals)
Inter-task approaches
Voltage linked to a task and along with context switch voltage is switched; assumes uniform task behavior and is not aware of program structure)
Intra-task approaches
Many approaches like fixed-length timeslots (hw), split a task into two sub-programs and use highest clock for first and the second adjusts (OS) and check-pointing (compiler support)
14. 13th Oct. 2008 CSV881: Low Power Design 14 Resource Hibernation Disk drives: OS stops disk rotation during periods of inactivity. As the dist is restarted and adds to delays implying both performance as well as energy loss
Predictive dynamic threshold adjustment; can be helped by OS which can cluster requests or delay non-urgent requests
Disk controllers can modulate speeds looking at input queue lengths
Network interfaces
Displays
15. 13th Oct. 2008 CSV881: Low Power Design 15 Compiler-Level Power Reduction Some performance oriented optimizations reduce power
e.g. Common sub-expression elimination
Some performance oriented optimizations may increase power consumption (may not be energy)
e.g. Loop unrolling
Compilers can help reduce power during instruction set selection, reducing memory accesses, structured data traversal patterns etc.
For mobile devices opportunities exist of remote compilation and/or remote execution
16. 13th Oct. 2008 CSV881: Low Power Design 16 Application Level Power Reduction Application transformations
Architecture aware
Software architecture graph transformations to reduce inter-process communication
Accuracy of computation traded against power
Quality of service traded against power
17. 13th Oct. 2008 CSV881: Low Power Design 17 Some Specific Techniques
18. 13th Oct. 2008 CSV881: Low Power Design 18 Value Cache Approach Yang[5] first proposed a cache based system for transmitting frequently occurring values. With a cache size restricted to word length (32), all hits can be transmitted by just toggling one bit with control indicating hit/miss. In miss the original data values are sent.
19. 13th Oct. 2008 CSV881: Low Power Design 19 TUBE: Value Cache Approach Dinesh et.al. [7] proposed the tunable approach where the bits are separated by their activity coefficients.
Two different caches were used. One for high activity bits and the other for low activity bits.
Because of small cache size, this could more effectively capture locality in data values.
20. 13th Oct. 2008 CSV881: Low Power Design 20 Hierarchical Value Cache Encoding[8] The HVCE is organized into multiple levels with each level storing 32/2(i-1) values.
21. 13th Oct. 2008 CSV881: Low Power Design 21 HVCE(contd.) A match at a higher level implies it will have matches at all lower levels as well.
The highest level match is encoded as the bit change in the 32 bit data bus.
A control (15 bit for 15 caches) indicates which caches have a hit. Address bus with spare bandwidth (cycle stealing is used for this)
Switch no. of bit to indicate the same VC address (indicating a hit in the control)
22. 13th Oct. 2008 CSV881: Low Power Design 22 Value Cache Approaches
23. 13th Oct. 2008 CSV881: Low Power Design 23 Secondary Memory Storage Management[9] All current storage management techniques assume magnetic storage as secondary memory and performance optimization as the sole objective
Flash memory is evolving as a popular alternative for secondary storage in portable devices
Power optimization is an equally important objective as performance
24. 13th Oct. 2008 CSV881: Low Power Design 24 Proposed Modifications Page size of 4KB to 12 KB used is too large for replacement; define sub-pages equal to the flash memory page for replacement (say 256B)
Use a SRAM (battery backed) as hot-cache for the flash to avoid frequent writes
Manage fragmentation to avoid frequent and expensive garbage collection
Hot-cache replacement policies have to be power sensitive
25. 13th Oct. 2008 CSV881: Low Power Design 25 Processor Pipeline: Power Reduction Processors today have a very deep pipeline.
Stalls occur due to data dependency, control dependency, resource contention and cache miss.
The stalls provide an opportunity for reducing power using clock gating of various stage latches.
Other techniques have also been reported for power reduction taking advantage of stalls.
26. 13th Oct. 2008 CSV881: Low Power Design 26 Clock Gating The basic technique has been clock gating of various stage registers in the stall condition.
27. 13th Oct. 2008 CSV881: Low Power Design 27 Pipelining Using Transparent Latches [10] The normal edge triggered latches are called opaque latches and clock gating saves energy
Level triggered latches when enabled become transparent and are called transparent latches. Energy is saved by enabling these latches and are equivalent to combining stages to create longer delay stages.
28. 13th Oct. 2008 CSV881: Low Power Design 28 Transparent Latches Transparent latches introduced in the pipeline can result in energy saving. This is of course dependent on the “distance” between subsequent useful work and thus dynamic in nature.
29. 13th Oct. 2008 CSV881: Low Power Design 29 Stall Cycles Redistribution[11] Stall cycles if redistributed can help save energy. The delay is also referred to as slack.
If slacks are longer than pipeline depth, then no use.
This information is stored with the “fetch group” as extra bits in the BTB. Both the predicted slack and confidence level is encoded in these bits.
Fetch of instructions with slack delayed upto the pipeline depth.
Report upto 50% reduction in the frontend (I-cache, branch predictor and front-end latches) energy-delay product. Only transparent latches give negligible (5%) reduction whereas the prediction with clock gating is effective in reducing by 25%. The loss in performance is less than 2%.
30. 13th Oct. 2008 CSV881: Low Power Design 30 Processors: Speculative Execution Energy Reduction High performance processors are all pipelined.
The current trend is to support speculative instruction execution and throw out the instructions in the execution pipeline before the write-back stage if the speculation is proved wrong.
This ensures the correctness while being power inefficient as the energy consumed in the unfinished instructions is wasted.
31. 13th Oct. 2008 CSV881: Low Power Design 31 Basic Strategy The basic strategy is to reduce the pipeline feed by gating the fetch stage as and when the probability of instruction finishing reduces.
32. 13th Oct. 2008 CSV881: Low Power Design 32 Approaches
33. 13th Oct. 2008 CSV881: Low Power Design 33 Low Power Multipliers Array multipliers and shift-and-add multipliers
Fixed coefficient multipliers ; DSP applications like filters/FFT/DCT etc.; functions like sine, cosine computation
Booth’s multipliers; CSD Coding to reduce the number of additions and subtraction
34. 13th Oct. 2008 CSV881: Low Power Design 34 CSD Coding[1] Replace string of 1’s (longer than two) by 1 and -1 to reduce the number of operations to 2
00111101 01000101
Modified CSD coding which consider a set of coefficients instead of one at a time to increase the number of 0 columns. These can be removed to save area.
35. 13th Oct. 2008 CSV881: Low Power Design 35 Synergistic Temperature and Energy Management[17] GALS are Globally asynchronous, locally synchronous architectures are getting popular
There are many clock domains which interact asynchronously
A temperature rise in one domain can be addressed by reducing only its clock period and thus reducing the impact on performance
In synchronous systems, reducing clock period would reduce overall performance
This would introduce “slack” in other domains due to excess capacity. This can be exploited to reduce clock in other domains to reduce energy further.
Cooling of temperature in adjoining domains would benefit heat reduction in the affected domain by providing a higher temperature gradient.
36. 13th Oct. 2008 CSV881: Low Power Design 36 References For first part
V. Venkatachalam and M. Franz, “Power Reduction Techniques for Microprocessor systems”, ACM Computing Surveys, Vol. 37, No. 3, sep. 2005, pp. 195-237
Low Power Multipliers:
ISLPED 2006 paper
Register file power reduction
ISLPED 2006 paper
Associative Memory
J. Sharkey et.al.,”Power efficient wakeup tag broadcast”, ICCD 2005
ISLPED 2006 paper
Off-chip Bus Power Reduction
J.Yang et. Al.,”Fv encoding for low power data I/O”, ISLPED 2001
Basu et.al.,”Power protocol: reducing power dissipation on off-chip data buses”, MICRO 2002
Dinesh et.al.,”A tunable encoder for off-chip buses”, ISLPED 2005
ISLPED 2006 paper
Secondary Storage Management
ISLPED 2006 paper
37. 13th Oct. 2008 CSV881: Low Power Design 37 References (contd.) Processor Pipeline: Power Reduction
H.M. Jacobson, ”Improved clock gating through transparent pipelining”, ISLPED 2004, August 2004
ISLPED 2006 paper
Speculative Execution Energy Reduction
Aragon et.al., “Power-aware control speculation through selective throttling”, HPCA9, pp. 1003-112, Feb. 2003
Baniasadi et. al., “Instruction flow based front-end throttling for power-aware high-performance processors” ISLPED ‘01, pp. 6-21, Aug. 2001
Buyuktosumoglu et. al., “Energy efficient co-adoptive instruction fetch and issue”, ISCA ’03, pp 147-156, June 2003
Manne et.al., “Pipeline gating: speculation control for energy reduction”, ISCA ’98, pp. 132-141, June 1998
ISLPED 2006 paper
Synergistic Temp. and Energy Management
ISLPED 2006 paper