270 likes | 401 Views
Authors: Zhigang Hu, Alper Buyuktosunoglu, Viji Srinivasan, Victor Zyuban, Hans Jacobson, Pradip Bose IBM T.J. Watson Research Center Page: 32-37, In International Symposium for Low Power Electronic Devices, 2004. Presenter: Sai Raghunath T.
E N D
Authors: Zhigang Hu, Alper Buyuktosunoglu, Viji Srinivasan, Victor Zyuban, Hans Jacobson, Pradip Bose IBM T.J. Watson Research Center Page: 32-37, In International Symposium for Low Power Electronic Devices, 2004. Presenter: Sai Raghunath T Microarchitectural Techniques for Power Gating of Execution Units
Sources of Power dissipation • Sub-threshold leakage • Gate leakage current • Circuit level approach for leakage power reduction • Body bias control • Dual threshold Domino circuits • Input vector control • Power gating
Architectural level leakage power reduction in caches and buffers • Tristating the drivers of bitlines of SRAM • Determination of Sleep mode activation policies for the integer functional units using Dual-Vt Domino logic circuits • Role of compiler to detect long idle periods for different functional units and enable power gating.
Work done in the paper: • Exploiting work load phases and characteristics to dynamically power gate OFF/ON selected units within a pipeline using Time based technique and Branch prediction technique • Specifications of out of-order issue Super scalar processor - Turandot
Fundamentals of Power gating: • Power gating is achieved by using suitably sized header or footer for a circuit. • ‘Sleep’ signal is applied when the logic detects sufficiently long idle period and the macro is turned OFF.
T1-T0= T(idle detect) T2-T1= T(idle delay) T3-T2= T(breakeven) T4-T2= T(full discharge) T5= detection of next busy interval T6-T5= T(busy delay) T7-T6= T(wakeup) • Sequence • 1. T0 -> T1= Leakage energy • 2. T1 -> T2= Overhead energy+ Leakage energy • (Overhead energy is the energy required to generate ‘Sleep’ signal) • Savings in leakage energy increase with decrease in supply voltage 3. T5 -> T6= Overhead energy 4. T6 -> T7= Leakage energy
T(breakeven) is the point when the aggregate leakage energy savings E(avg saved) equals the energy overhead of switching ON and OFF the header/footer device. Typically, the value of N (breakeven) is 10 DIBL= Drain Induced Barrier Lowering factor (typically 0.1) WH= total area of header device total area of clock gated macro α- switching factor m = 0.1
Power gating of execution units • Quantifying the Power gating potential for out-of-order Superscalar processor model using different applications from SPEC2K suite. Assumptions: • T(idle delay)= T(busy delay)=0 →perfect predictor • T(idle) > T(overhead) ( =T(wakeup)+T(breakeven))
The following equations estimate the fraction of cycles the units can be power gated: Ex: Sequence of activity bits of some unit 1111 00000 111111 0000 1111 000000 1111 T(overhead) =3 Opp cycles = (5-3)+ (4-3) +(6-3) =6 Power gating potential = 6/33 =18.18 %~ 19%
Power gating potential averaged across SPEC2K FP applications for various values of T(overhead)
Power gating potential averaged across SPEC2K integer applications for various values of T(overhead)
Time-Based Power Gating: • Assumptions: • T(breakeven)= T(breakeven)+ T(idle delay) • T(wakeup)= T(wakeup) +T(busy delay) • One issue queue per execution unit • Logic used: • Observe the state of an execution unit and turn it OFF when a long streak of idle cycles is seen
FSM: State Machine of an execution unit when power gating is engaged
% of cycles in sleep mode for FPU with different T(idle detect) and T(breakeven). T(wakeup)= 3 cycles
Avg IPC of SPECFP2K suite with different T(idle detect) and T(wake up) values. T(break even)=9 cycles. IPC is normalized to the base case where Power gating is disabled. • Long idle periods coupled with smaller values of T(break even) and T(wakeup) • help achieve large leakage reductions and mitigate overall performance loss savings • T(idle detect)= 6-12 cycles for optimum balance between performance and power
% of cycles in sleep mode for FXU with different T(idle detect) and T(breakeven). T(wakeup)= 3 cycles
Avg IPC of SPECINT2K suite with different T(idle detect) and T(wake up) values. T(break even)=9 cycles. IPC is normalized to the base case where Power gating is disabled.
Branch prediction guided Power gating: • Observations from the previous graphs show that FXU typically had short idle periods. • So, it is difficult to efficiently implement Power gating in integer execution units. • Branch mispredictions are highly disruptive events in speculative out-of-order processors – Good chance of implementing Power gating techniques. • In the event of branch misprediction, the pipeline is flushed and correct instruction is fetched • During this process, execution unit is idle.
New branch prediction guided power gating technique: • As soon as the branch misprediciton is detected, all idle FXUs are transferred to ‘Uncompensated’ state →reduction in T(idle detect) → higher % of cycles in ‘sleep’ mode → smaller performance loss and better leakage reduction
% of performance loss in sleep mode versus performance degradation techniques T(breakeven)=9 cycles; T(wakeup)= 3 cycles
Conclusions and critique: • Time based technique is efficient for FP execution units which have relatively high idle time. • Branch prediction technique is efficient for Integer execution units. • No mention about the advantage/disadvantage of power gating over other circuit level approaches for leakage power reduction. • How efficient is Power gating if the above mentioned assumptions are relaxed?? • What is the power consumption from the macro generating ‘Sleep’ signal? What is the ratio of its power consumption to power savings?
How is this paper relevant to the class?? • State-of-art microprocessors are facing the problem of high power leakage due to scaling of technology. • Leakage power is high from the execution units which are the most important blocks in the microprocessor. This paper gives a good insight in understanding techniques to reduce leakage power. • Also, various power gating techniques to reduce the power dissipation in CMP and SMT architectures can be explored.
Project: • Considering a small integer ALU and comparing various circuit level approaches with Power gating and suggesting the better technique(s)- the idea that will be suggested can be a optimum mix of using 2 or more circuit level approaches.
THANK YOU • Q &A?