1 / 37

Low Power Processor Design: Part II

13th Oct. 2008. CSV881: Low Power Design. 2. Contents. Dynamic and Leakage Power consumptionLow power processor adaptations DVSOS level power reductionPower aware compilerApplication transformations for power reductionSome specific power reduction techniquesReferences. 13th Oct. 2008. CSV

faraji
Download Presentation

Low Power Processor Design: Part II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. 13th Oct. 2008 CSV881: Low Power Design 1 Low Power Processor Design: Part II M. Balakrishnan

    2. 13th Oct. 2008 CSV881: Low Power Design 2 Contents Dynamic and Leakage Power consumption Low power processor adaptations DVS OS level power reduction Power aware compiler Application transformations for power reduction Some specific power reduction techniques References

    3. 13th Oct. 2008 CSV881: Low Power Design 3 Dynamic Power Switched capacitance Power consumed for charging and discharging the capacitance associated with all inputs, outputs as well as interconnects 85 to 90% of dynamic power Short circuit current Power consumed due to overlap in PMOS and NMOS on-time during switching 10-15% of dynamic power

    4. 13th Oct. 2008 CSV881: Low Power Design 4 Switched Capacitance Power Consumption Switched capacitance 80 to 90% of dynamic power Pdynamic ~ a C V2 f Lower switched capacitance (low level design techniques mainly transistor sizes as well as connectivity at higher levels) Lower switching activity (all levels; synthesis, clock gating) Reduce clock frequency Reduce supply voltage (DVS: effective as cubic relationship)

    5. 13th Oct. 2008 CSV881: Low Power Design 5 Leakage Current

    6. 13th Oct. 2008 CSV881: Low Power Design 6 Types Of Leakage Reverse-biased-junction leakage Gate-induced-drain leakage Sub-threshold leakage Gate-oxide leakage Gate-current leakage Punch-through leakage

    7. 13th Oct. 2008 CSV881: Low Power Design 7 Gate-oxide Leakage Flows from gate into the substrate This leakage increases exponentially as thickness of the oxide decreases Thickness is to reduce with other reduction in geometries and supply voltage One solution is to use high-k dialectric material

    8. 13th Oct. 2008 CSV881: Low Power Design 8 Sub-threshold Leakage Dominant leakage current Isub = K1 W e(–Vth/nT) (1 – e(–V/T)) Isub increases exponentially as Vth decreases Isub increases as temperature T increases (thermal runaway)

    9. 13th Oct. 2008 CSV881: Low Power Design 9 Reduction of Sub-threshold Leakage Current Reduce supply voltage Reduce size of the circuit Resize transistors as per performance requirements Dynamically cut power supply to unused circuits Cooling Reduce threshold voltage Stack the off-transistors in series Isolating supply through sleep transistors Dual threshold; higher threshold on non-critical paths Adaptive body biasing

    10. 13th Oct. 2008 CSV881: Low Power Design 10 Low-Power Processor Adaptations Adaptive Cache Deactivates cache sets depending on the current application characteristics Drowsy cache; lines in unused portions of the cache placed in drowsy state Power down of blocks which had their last use (compiler directed) Adaptive instruction queues IPC is monitored and Queue size adjusted Reconfiguring multiple structures Structures like instruction queue, reorder buffer, load/store buffer changed based on hotspots

    11. 13th Oct. 2008 CSV881: Low Power Design 11 Dynamic Voltage Scaling DVS is the most widely used technique for reducing power consumption. DVS reduces the voltage and slows down the processor frequency for workloads which have slack time available to execute

    12. 13th Oct. 2008 CSV881: Low Power Design 12 Issues in DVS Unpredictable nature of workloads as well as “preemption” of tasks by interrupts creates further problems Learning techniques create learning lags reducing its utility Voltage to power relationship is not quadratic due to how I/O signals are derived (separate voltage) and also how the peripheral devices are being managed Inter-task relationships if one processor or core is slowed down

    13. 13th Oct. 2008 CSV881: Low Power Design 13 DVS Strategies Interval based approaches Processor idle time in a window Aged averages (weighted averages with lower weights to previous intervals) Inter-task approaches Voltage linked to a task and along with context switch voltage is switched; assumes uniform task behavior and is not aware of program structure) Intra-task approaches Many approaches like fixed-length timeslots (hw), split a task into two sub-programs and use highest clock for first and the second adjusts (OS) and check-pointing (compiler support)

    14. 13th Oct. 2008 CSV881: Low Power Design 14 Resource Hibernation Disk drives: OS stops disk rotation during periods of inactivity. As the dist is restarted and adds to delays implying both performance as well as energy loss Predictive dynamic threshold adjustment; can be helped by OS which can cluster requests or delay non-urgent requests Disk controllers can modulate speeds looking at input queue lengths Network interfaces Displays

    15. 13th Oct. 2008 CSV881: Low Power Design 15 Compiler-Level Power Reduction Some performance oriented optimizations reduce power e.g. Common sub-expression elimination Some performance oriented optimizations may increase power consumption (may not be energy) e.g. Loop unrolling Compilers can help reduce power during instruction set selection, reducing memory accesses, structured data traversal patterns etc. For mobile devices opportunities exist of remote compilation and/or remote execution

    16. 13th Oct. 2008 CSV881: Low Power Design 16 Application Level Power Reduction Application transformations Architecture aware Software architecture graph transformations to reduce inter-process communication Accuracy of computation traded against power Quality of service traded against power

    17. 13th Oct. 2008 CSV881: Low Power Design 17 Some Specific Techniques

    18. 13th Oct. 2008 CSV881: Low Power Design 18 Value Cache Approach Yang[5] first proposed a cache based system for transmitting frequently occurring values. With a cache size restricted to word length (32), all hits can be transmitted by just toggling one bit with control indicating hit/miss. In miss the original data values are sent.

    19. 13th Oct. 2008 CSV881: Low Power Design 19 TUBE: Value Cache Approach Dinesh et.al. [7] proposed the tunable approach where the bits are separated by their activity coefficients. Two different caches were used. One for high activity bits and the other for low activity bits. Because of small cache size, this could more effectively capture locality in data values.

    20. 13th Oct. 2008 CSV881: Low Power Design 20 Hierarchical Value Cache Encoding[8] The HVCE is organized into multiple levels with each level storing 32/2(i-1) values.

    21. 13th Oct. 2008 CSV881: Low Power Design 21 HVCE(contd.) A match at a higher level implies it will have matches at all lower levels as well. The highest level match is encoded as the bit change in the 32 bit data bus. A control (15 bit for 15 caches) indicates which caches have a hit. Address bus with spare bandwidth (cycle stealing is used for this) Switch no. of bit to indicate the same VC address (indicating a hit in the control)

    22. 13th Oct. 2008 CSV881: Low Power Design 22 Value Cache Approaches

    23. 13th Oct. 2008 CSV881: Low Power Design 23 Secondary Memory Storage Management[9] All current storage management techniques assume magnetic storage as secondary memory and performance optimization as the sole objective Flash memory is evolving as a popular alternative for secondary storage in portable devices Power optimization is an equally important objective as performance

    24. 13th Oct. 2008 CSV881: Low Power Design 24 Proposed Modifications Page size of 4KB to 12 KB used is too large for replacement; define sub-pages equal to the flash memory page for replacement (say 256B) Use a SRAM (battery backed) as hot-cache for the flash to avoid frequent writes Manage fragmentation to avoid frequent and expensive garbage collection Hot-cache replacement policies have to be power sensitive

    25. 13th Oct. 2008 CSV881: Low Power Design 25 Processor Pipeline: Power Reduction Processors today have a very deep pipeline. Stalls occur due to data dependency, control dependency, resource contention and cache miss. The stalls provide an opportunity for reducing power using clock gating of various stage latches. Other techniques have also been reported for power reduction taking advantage of stalls.

    26. 13th Oct. 2008 CSV881: Low Power Design 26 Clock Gating The basic technique has been clock gating of various stage registers in the stall condition.

    27. 13th Oct. 2008 CSV881: Low Power Design 27 Pipelining Using Transparent Latches [10] The normal edge triggered latches are called opaque latches and clock gating saves energy Level triggered latches when enabled become transparent and are called transparent latches. Energy is saved by enabling these latches and are equivalent to combining stages to create longer delay stages.

    28. 13th Oct. 2008 CSV881: Low Power Design 28 Transparent Latches Transparent latches introduced in the pipeline can result in energy saving. This is of course dependent on the “distance” between subsequent useful work and thus dynamic in nature.

    29. 13th Oct. 2008 CSV881: Low Power Design 29 Stall Cycles Redistribution[11] Stall cycles if redistributed can help save energy. The delay is also referred to as slack. If slacks are longer than pipeline depth, then no use. This information is stored with the “fetch group” as extra bits in the BTB. Both the predicted slack and confidence level is encoded in these bits. Fetch of instructions with slack delayed upto the pipeline depth. Report upto 50% reduction in the frontend (I-cache, branch predictor and front-end latches) energy-delay product. Only transparent latches give negligible (5%) reduction whereas the prediction with clock gating is effective in reducing by 25%. The loss in performance is less than 2%.

    30. 13th Oct. 2008 CSV881: Low Power Design 30 Processors: Speculative Execution Energy Reduction High performance processors are all pipelined. The current trend is to support speculative instruction execution and throw out the instructions in the execution pipeline before the write-back stage if the speculation is proved wrong. This ensures the correctness while being power inefficient as the energy consumed in the unfinished instructions is wasted.

    31. 13th Oct. 2008 CSV881: Low Power Design 31 Basic Strategy The basic strategy is to reduce the pipeline feed by gating the fetch stage as and when the probability of instruction finishing reduces.

    32. 13th Oct. 2008 CSV881: Low Power Design 32 Approaches

    33. 13th Oct. 2008 CSV881: Low Power Design 33 Low Power Multipliers Array multipliers and shift-and-add multipliers Fixed coefficient multipliers ; DSP applications like filters/FFT/DCT etc.; functions like sine, cosine computation Booth’s multipliers; CSD Coding to reduce the number of additions and subtraction

    34. 13th Oct. 2008 CSV881: Low Power Design 34 CSD Coding[1] Replace string of 1’s (longer than two) by 1 and -1 to reduce the number of operations to 2 00111101 01000101 Modified CSD coding which consider a set of coefficients instead of one at a time to increase the number of 0 columns. These can be removed to save area.

    35. 13th Oct. 2008 CSV881: Low Power Design 35 Synergistic Temperature and Energy Management[17] GALS are Globally asynchronous, locally synchronous architectures are getting popular There are many clock domains which interact asynchronously A temperature rise in one domain can be addressed by reducing only its clock period and thus reducing the impact on performance In synchronous systems, reducing clock period would reduce overall performance This would introduce “slack” in other domains due to excess capacity. This can be exploited to reduce clock in other domains to reduce energy further. Cooling of temperature in adjoining domains would benefit heat reduction in the affected domain by providing a higher temperature gradient.

    36. 13th Oct. 2008 CSV881: Low Power Design 36 References For first part V. Venkatachalam and M. Franz, “Power Reduction Techniques for Microprocessor systems”, ACM Computing Surveys, Vol. 37, No. 3, sep. 2005, pp. 195-237 Low Power Multipliers: ISLPED 2006 paper Register file power reduction ISLPED 2006 paper Associative Memory J. Sharkey et.al.,”Power efficient wakeup tag broadcast”, ICCD 2005 ISLPED 2006 paper Off-chip Bus Power Reduction J.Yang et. Al.,”Fv encoding for low power data I/O”, ISLPED 2001 Basu et.al.,”Power protocol: reducing power dissipation on off-chip data buses”, MICRO 2002 Dinesh et.al.,”A tunable encoder for off-chip buses”, ISLPED 2005 ISLPED 2006 paper Secondary Storage Management ISLPED 2006 paper

    37. 13th Oct. 2008 CSV881: Low Power Design 37 References (contd.) Processor Pipeline: Power Reduction H.M. Jacobson, ”Improved clock gating through transparent pipelining”, ISLPED 2004, August 2004 ISLPED 2006 paper Speculative Execution Energy Reduction Aragon et.al., “Power-aware control speculation through selective throttling”, HPCA9, pp. 1003-112, Feb. 2003 Baniasadi et. al., “Instruction flow based front-end throttling for power-aware high-performance processors” ISLPED ‘01, pp. 6-21, Aug. 2001 Buyuktosumoglu et. al., “Energy efficient co-adoptive instruction fetch and issue”, ISCA ’03, pp 147-156, June 2003 Manne et.al., “Pipeline gating: speculation control for energy reduction”, ISCA ’98, pp. 132-141, June 1998 ISLPED 2006 paper Synergistic Temp. and Energy Management ISLPED 2006 paper

More Related