On the Limits of Leakage Power Reduction in Caches

On the Limits of Leakage Power Reduction in Caches Yan Meng, Tim Sherwood and Ryan Kastner UC, Santa Barbara HPCA-2005

Overview • Caches are good targets for tackling the leakage problem • Much work has been done in this field • Gated-Vdd • [Powell 01], [Agarwal 02], [Roy 02], [Hu 02], [Kaxiras 01], [Zhou 03], [Velusamy 02] • Multiple supply voltages • [Flaunter 02], [Kim 02,04], [Mudge 04] • Others • [Hu 03] , [Li 04], [Heo 02], [Hanson 01], [Li 03], [Bai 05], [Skadron 04], [Zhang 02], [Azizi et al. 03]

Research Question and Finding • What is the best leakage power saving we could hope to achieve with existing techniques? • Far more potential left for further reducing leakage power in caches

Outline • Motivation • Definitions • Optimal approach • The generalized model • Experimental results • Conclusions

Motivation • Why to study leakage problem? • Leakage power: dominant source for power consumption as technology scales down below 100nm Fig: Projected leakage power consumption as a fraction of the total power consumption according to International Technology Roadmap for Semiconductor

Motivation • Why to tackle the leakage problem through caches? • Caches : huge chip area (50% 2005 [ITRS]) • Major source for leakage power consumption Alpha 21364 microprocessor die photo [http://www.oracle.com/technology/products/rdb/pdf/2002_tech_forums/rdbtf_2002_opt_on_alpha_mdr.pdf]

Motivation • How to tackle the problem with existing techniques? • Keep frequently accessed cache lines active to ensure high performance • Turn off cache lines that are not used for a long time • Use low supply voltage to save power for the rest • What’s the best that the existing circuit and architecture techniques could achieve? How much room is left for further research?

access(i) access(i+1) |Ii| Time Definitions – Cache Interval • Time between two successive accesses to the same cache line

Definitions --- Operating Modes • Active mode • Power on the whole cache line • No power saving • Sleep mode[Roy01, Hu01] • Sleep/“turn off” transistors • Lose data • Refetch data with high overhead • Drowsy mode[Flautner02,Mudge04] • Use low supply voltage to save power when it is not needed • Preserve data for fast reaccess • Wake up to the high voltage and return data

Choosing Operating Modes • Active mode • Sleep mode • Drowsy mode |Ii|

Optimal Approach • Differences • Studying optimality • Combining all three modes to achieve the maximal leakage power saving • Optimal policy • Oracle knowledge of future address trace • Applying the appropriate operating mode on each cache interval • Obtaining optimal leakage power saving • Formal proof of the optimality

Inflection Points • Which mode to apply on each interval? • Active-drowsy inflection point a • The least amount of time drowsy mode needs to save energy • Sleep-drowsy inflection point b • The time where sleep and drowsy modes consume the same amount of energy

Selecting Operating Modes with Inflection Points Active Interval Active Mode 0<|I|≤a Drowsy Interval Drowsy Mode |I|? I a<|I|≤b |I|>b Sleep Interval Sleep Mode Optimality

Calculating Inflection Points • Active-drowsy inflection point a • Sleep-drowsy inflection point b CD

Saving Leakage Power without Performance Degradation • Deriving the interval lengths with perfect knowledge of the future address trace • Fetching any needed data just before it is needed • Avoiding any performance impact • Taking into account the power cost of just-in-time refetch CD

Just before needed Just before needed Saving Leakage Power without Performance Degradation

EAS The Generalized Model • Parameterized model • Inputs • Wake-up latencies • Interval distribution • Leakage power of each state • Transition energy between states • Outputs • Optimal savings of OPT-Drowsy, OPT-Sleep, and OPT-Hybrid • Can be extended to accommodate future technologies and power saving modes • Publicly available • http://express.ece.ucsb.edu/software/leakage.html P(Active) P(Sleep)

Methodology • Core: Compaq Alpha 21264 [Kessler 99] • Memory • 2-way L1 instruction and data caches, 64KB • Unified direct mapped L2 cache, 2MB • LRU replacement policy • Tools • SimAlpha simulator • HotLeakage • Leakage power and dynamic cost • Parameters:taken from HotLeakage • Averaged results over all benchmark applications

Calculating Inflection Points • The sleep-drowsy point decreases from 180nm to 70nm • Because the leakage power consumption increases while the dynamic power consumption caused by an induced miss decreases • Our approach can be parameterized and applied to many other memory technologies • 70nm, the most advanced technology, is used in the rest of our study

OPT-Drowsy Sleep(10K) OPT-Sleep(10K) OPT-Hybrid Exploring the Upper-bound OPT-Drowsy No performance penalty for waking up data Sleep(10K) Turning off cache lines after 10K cycles [Hu01] OPT-Sleep(10K) Turning off cache lines with lengths greater than 10K cycles OPT-Hybrid Optimally combining three modes w/o performance penalty L1 data cache

Research Finding • Larger leakage saving can be achieved for data cache • Drowsy and sleep modes each achieve fairly high savings • Savings are complementary: potential in combining drowsy and sleep technologies

Conclusions • Why leakage? • Leakage: dominant source of power consumption as technology scales down below 100nm • Caches: primary targets to tackle the problem • Optimal approach and software • Calculating the maximal leakage savings • Quantifying how much room left for improvement • Used to guide future power management policy research • Great potential in combining techniques • Optimally combining Active, Drowsy, and Sleep • The optimal approach reduces power dissipation • Instruction cache: by a factor of 5.3 • Data cache: by a factor of 2

On the Limits of Leakage Power Reduction in Caches

On the Limits of Leakage Power Reduction in Caches

Presentation Transcript

Reducing Leakage Power in Peripheral Circuits of L2 Caches

Leakage Power Reduction Techniques

On the limits of partial compaction

Limits on the Power of Cryptographic Cheap Talk

A 32-bit ALU with Sleep Mode for Leakage Power Reduction

The Power and the Limits of Computation

Leakage Reduction in SRAM Utilizing Power Gating

Dynamic Fine-Grain Leakage Reduction Using Leakage-Biased Bitlines

Leakage reduction techniques

Leakage reduction techniques

Leakage reduction techniques

Drowsy Caches: Simple Techniques for Reducing Leakage Power

FlexiBuffer : Reducing Leakage Power in On-Chip Network Routers

Cache Decay: Mechanisms to Reduce Leakage Power in Caches

On the Limits of Computing

Some Limits of Power Delivery in the Multicore Era

Performance Optimization for Low-Leakage Caches based on Sleep-Line Access Density

Improving on Caches

Leakage Power Analysis of a 90nm FPGA

On the Limits of Computing

Prediction of Power Consumption and Leakage Detection

Leakage Reduction Programs | Aquaanalytics.com.au