130 likes | 255 Views
Low Static-Power Frequent-Value Data Caches. Chuanjun Zhang*, Jun Yang, and Frank Vahid** *Dept. of Electrical Engineering Dept. of Computer Science and Engineering University of California, Riverside **Also with the Center for Embedded Computer Systems at UC Irvine
E N D
Low Static-Power Frequent-Value Data Caches Chuanjun Zhang*, Jun Yang, and Frank Vahid** *Dept. of Electrical Engineering Dept. of Computer Science and Engineering University of California, Riverside **Also with the Center for Embedded Computer Systems at UC Irvine This work was in part supported by the National Science Foundation and the Semiconductor Research Corporation Chuanjun Zhang, UC Riverside
Leakage Power Dominates • Growing impact of leakage power • Increase of leakage power due to scaling of transistors’ lengths and threshold voltages. • Power budget limits use of fast leaky transistors. • Cache consumes much static power • Caches account for the most of the transistors on a die. • Related work • DRG:dynamically resizes cache by monitoring the miss rate. • Cache line decay: dynamically turns off cache lines. • Drowsy cache: low leakage mode. Chuanjun Zhang, UC Riverside
00000000 00000000 00000000 00000000 00000000 00100000 00000000 00000000 FFFFFFFF FF000000 00000234 FFFFFFFF FFFF1234 FFFF1234 00100000 FFFFFFFF FFFFFFFF 00000234 FFFFFFFF 00000000 Data read out from L1 data cache 00100000 00000000 FF000000 FFFFFFFF 2341FFFF FFFFFFFF Frequent Values in Data Cache (J. Yang and R. Gupta Micro 2002) • Frequently accessed values behavior Microprocessor data data data data data data data address address address address address address address L1 DATA CACHE Chuanjun Zhang, UC Riverside
Frequent Values in Data Cache (J. Yang and R. Gupta Micro 2002) • 32 FVs account for around 36% of the total data cache accesses for 11 Spec 95 Benchmarks. • FVs can be dynamically captured. • FVs are also widespread within data cache • Not just accesses, but also stored throughout. • FVs are stored in encoded form. • 4 or 5 bits represent 16 or 32 FVs. • Non-FVs are stored in unencoded form. • The set of frequent values remains fixed for a given program run. FVs 00000000 00000000 00000000 00000000 00100000 00000000 00000000 00000000 FFFFFFFF FFFFFFFF FFFFFFFF FVs accessed 00000000 00000000 00100000 00000000 FF000000 FFFFFFFF FFFFFFFF FFFFFFFF FVs in D$ Chuanjun Zhang, UC Riverside
Original Frequent Value Data Cache Architecture • Data cache memory is separated as low-bit and high-bit array. • 5 bits encodes 32 FVs. • 27 bits are not accessed for FVs. • A register file holds the decoded lines. • Dynamic power is reduced. • Two cycles when accessing Non-FVs. • Flag bit: 1-FV ; 0-NFV Chuanjun Zhang, UC Riverside
(a) (b) flag bits 20 bits flag bits 27 bits 27 bits 27 bits 27 bits new driver decoder output New cache line architecture: subbanking New FV Cache Design: One Cycle Access to Non FV • No extra delay in determining accesses of the 27-bit portion • Leakage energy proportion to program execution time • New driver is as fast as the original by tuning the NAND gate’s transistor parameters • Flag bit: 0-FV ; 1-NFV 32 bits driver 27 bits decoder output 5 bits Original cache line architecture new word line driver original word line driver Chuanjun Zhang, UC Riverside
flag bits 20 bits flag bits 27 bits 27 bits 27 bits 27 bits new driver decoder output New cache line architecture: sub banking Low leakage SRAM Cell and Flag Bit Vdd Bitline Gated-Vdd Control Bitline Vdd Bitline Bitline Flag bit output Gated_Vdd Control Gnd Gnd SRAM cell with a pMOS gated Vdd control. Flag bit SRAM cell Chuanjun Zhang, UC Riverside
Experiments • SimpleScalar. • Eleven Spec 2000 benchmarks • Fast Forward the first 1 billion and execute 500M Configuration of the simulated processor. Chuanjun Zhang, UC Riverside
Performance Improvement of One Cycle to Non-FV Hit rate of FVs in data cache. • Two cycles impact performance hence increase leakage power • One cycle access to Non FV achieves 5.5% performance improvement (and hence impacts leakage energy correspondingly) 5.5% Performance (IPC) improvement of one-cycle FV cache vs. two-cycle FV cache. Chuanjun Zhang, UC Riverside
Distribution of FVs in Data Cache • FVs are widely found in data cache memory. On average 49.2%. • Leakage power reduction proportional to the percentage occurrence of FVs Percentage of data cache words (on average) that are FVs. Chuanjun Zhang, UC Riverside
Static Energy Reduction • 33% total static energy savings for data caches. Chuanjun Zhang, UC Riverside
How to Determine the FVs • Application-specific processors • The FVs can be first identified offline through profiling, and then synthesized into the cache so that power consumption is optimized for the hard coded FVs. • Processors that run multiple applications • The FVs can be located in a register file to which different applications can write a different set of FVs. • Dynamically-determined FVs • Embed the process of identifying and updating FVs into registers, so that the design dynamically and transparently adapts to different workloads with different inputs automatically. Chuanjun Zhang, UC Riverside
Conclusion • Two improvements to the original FV data cache: • One cycle access to Non FVs • Improve performance (5.5%) and hence static leakage • Shut off the unused 27 bits portion of a FV • The scheme does not increase data cache miss rate • The scheme further reduces data cache static energy by over 33% on average Chuanjun Zhang, UC Riverside