160 likes | 321 Views
Cache Decay: Mechanisms to Reduce Leakage Power in Caches. Stefanos Kaxiras ISCA 2001, Kaxiras, Hu, Martonosi PACS 2000/ASPLOS, Kaxiras, Hu, Narlikar, McLellan Cache Decay, Adaptive Cache Decay Automatic decay-based refresh, Diodato, Kaxiras. Roadmap. Cache Decay Adaptive Cache Decay
E N D
Cache Decay: Mechanisms to Reduce Leakage Power in Caches Stefanos Kaxiras ISCA 2001, Kaxiras, Hu, Martonosi PACS 2000/ASPLOS, Kaxiras, Hu, Narlikar, McLellan Cache Decay, Adaptive Cache Decay Automatic decay-based refresh, Diodato, Kaxiras
Roadmap • Cache Decay • Adaptive Cache Decay • Future work: COLDCACHE, COLDeRAM
Summary • Static Power consumption becomes important! • In this talk: • Efficient mechanisms to significantly reduce (~5x) static power consumption in caches without affecting performance • Based on fundamental characteristics of cache-line accesses • Works for everything we examined: 18 Spec95 (int & fp), 18 Spec2K (int & fp), Mediabench, many cache architectures
Power • NOW: 90%-99% Dynamic + 10%-1%Static • Dynamic dissipated by switching transistors • Static (leakage) dissipated by all • Pdynamic = A * Ctransistor * Vdd2 * Frequency • A,C,F , Vdd with every generation • But Vdd dramatic increase in static power: • Expected increase 5x with every generation • Already a problem: • Near future: 50% Dynamic + 50% Static
Cache Decay : Reducing Static Power in Caches • Caches contain lots of transistors • Main idea: • Power-down cache lines that do not contain useful data • Switch off Vdd to cache line using a Gated Vddtransistor [Powell et al] • Which cache-lines do not contain useful data ?
Generational Behavior of Cache Lines Typical behavior New Data access (cache miss & replacement) New Data access (cache miss) frame time Cache Dead Time Multiple accesses in a short time Fundamental observation: dead times are HUGE ! (We have data for that!)
Decay Interval Off Proposal: Cache decay • Algorithm: timer per cache line • If cache line accessed frequently maintain power: reset timer w/ every access • If not accessed for long time switch off Vdd: timer=decay interval switch off Vdd • Butcould be mistaken (incur decay misses) • Power-on on miss • Decay interval ~ 8K cycles or more
Practical Implementation of Cache Decay Global N-digit cycle counter • Local 2-bit counters count large periods signaled by global counter • 2 bits enough to approximate resolution of full counters Additional HW < 5% Cache line + Tag Local 2-bit counters
Range of imperceptible performance impact Results: Active Size vs. Miss Rate Average for Spec2000 progs 1Kcycles 8Kcycles 64Kcycles 512Kcycles Inf
Dynamic energy per miss = 5, 10, 20, 100 Static energy per cycle Cache Decay and Energy • Cache Decay reduces active size • reduces static power consumption • But increases miss rate (# of cache misses) • Increases dynamic power consumption • Energy per miss = ? • To study total energy we use the ratio: Trend
Results : energy (Spec2000) • Normalized leakage energy= (new leakage energy+ dynamic overhead) / original leakage energy • Dynamic overhead : decay misses + additional HW (100) (20) (10) (5)
Adaptive Cache Decay • Best results require different decay intervals for different programs • Automatically adjust decay interval • Independently for each cache line • Each cache line has its own decay interval • Adjust accordingly to mistakes/successes • No need to figure out what decay interval to use • Simple implementation changes
Fast Decay Slow Decay 2-bit counters Cache frame State & Control Implementation: Adaptive Decay • For each cache line: start with fast decay • If we make mistake then increase decay else decrease • Mistake if next access comes too fast
Adaptive Results Adaptive Cache Decay (Spec200) • 10 intervals, 1Kc-512Kc, successive powers-of-2 • Adaptive Cache Decay achieves lower normalized static power than simple decay automatically Iso-power lines total power remains constant (ratio = 10) 1Kc 8Kc Decay 64Kc 512Kc
Summary • Dramatic increase in static power expected • Cache decay: • Turn off Vdd to cache lines that are not likely to contain useful data • Reduces static power (active ratio) without affecting performance (miss rate) • Adaptive Cache Decay: • Adjust decay interval per cache frame • Outperforms simple decay without the need to figure-out decay intervals • Impact: U.Mich., U.T.Austin, NCS, IBM