170 likes | 320 Views
ISLPED’99 International Symposium on Low Power Electronics and Design. Koji Inoue, Tohru Ishihara, and Kazuaki Murakami. Way-Predicting Set-Associative Cache for High Performance and Low Energy Consumption. Department of Computer Science and Communication Engineering Kyushu University.
E N D
ISLPED’99International Symposium on Low Power Electronics and Design Koji Inoue, Tohru Ishihara, and Kazuaki Murakami Way-Predicting Set-Associative Cache for High Performance and Low Energy Consumption Department of Computer Science and Communication Engineering Kyushu University ppram@c.csce.kyushu-u.ac.jp
Conventional 4-Way Set-Associative Cache Tag subarray Cache-line subarray Way 0 Way 1 Way 2 Way 3 Step1. Address Decode Decode circuit Step2.Read out of a tag and a line from each way Activate of word line Activate senseamps pre(dis)charge bit lines Total energy for an access for I/O pin drive for decode Ecache = Edecode + Ememory + Eio Step3. Tag comparison for SRAM access Hit Miss Step4.Provide the required data Step4.Cache replacement Activate of I/O pins
Phased 4-Way Set-Associative Cache for Low Energy Consumption Energy consumption improvement by sacrificing the performance Step1. Address Decode Step2.Read out of only tags Cycle 1 Step3. Tag comparison Miss Hit Step4. Cache replacement Cycle 2 Step4.Read out of only the desired line Step5.Provide the required data
Way-Predicting Set-Associative Cache - Concept - How can we achieve high-performance and low energy consumption at the same time? Fast access by reading out both of tag and line simultaneously Conventional : Good! Phased : Bad! Low energy by avoiding unnecessary line read access Conventional : Bad! Phased : Good! Predict which way has the data desired by the processor before the cache access is started
4Way-Predicting Set-Associative Cache - Operation - Step0.Way prediction Way Prediction (Cache-line Base MRU Algorithm) Step1. Address decode Step2.Read out the predicted tag and line Cycle 1 Step3. Tag comparison Miss Prediction Hit Step4.Read out the remaining tags and lines Step4.End Step5. Tag comparison Cycle 2 Prediction Miss Cache Miss Step6.End Step6.Cache replacement
4Way-Predicting Set-Associative Cache- Organization - MRU Algorithm
Evaluation Environment Cache Models • Conventional 4-way Set-Associative Cache (4SACache) • Phased 4-way Set-Associative Cache (P4SACache) • Way-Predicting 4-way Set-Associative Cache (WP4SACache) Cache Size : 16 K Byte, Cache-line Size : 32 Byte, Replacement Algorithm : LRU Evaluation Items Performance(Tcache): average number of clock cycles for an access Energy (Ecache): average energy consumption for an access Energy consumed for accessing a tag-subarray Energy consumed for accessing a line-subarray Ecache ~ Ememory = NtagxEtag+NdataxEdata Ave. number of tag-subarray accessed for an access Ave. number of line-subarray accessed for an access
Static Analysis- Energy and Performance Expression - P4SACache 4SACache E4SACache EP4SACache 4Etag + 4 Edata 4Etag + Edata x CHR T4SACache TP4SACache 1 1 + 1x CHR EWP4SACache WP4SACache (Etag + Edata) + (3 Etag + 3 Edata) x (1 - PHR) TWP4SACache CHR:Cache Hit Rate PHR:Prediction Hit Rate 1 + 1 x (1 - PHR)
Static Analysis- Best and Worst Case - 4SACache (Conventional) P4SACache (Phased) WP4SACache (Ours) Energy Consumption (Etag = 0.078Edata) Performance Compare with Conventional (4SACache) Best Case (PHR = 100%) : 75% energy improvement without any performance degradation Worst Case (PHR = 0%) : 100% performance overhead without any energy improvement
Experimental Analysis- Result of Instruction Cache - 4SACache = 1.0 P4SACache WP4SACache (Our approach) Normalized Tcache Normalized Ecache
Experimental Analysis- Result of Data Cache - 4SACache = 1.0 P4SACache WP4SACache (Our approach) Normalized Tcache Normalized Ecache
Conventional (4SACache) Phased (P4SACache) Way-Predicting (WP4SACache) Experimental Analysis- Energy and Performance - Average of all benchmarks 195.8% 199.4% 200 200 113.0% I-Cache D-Cache 104.1% Normalized Results (%) Normalized Results (%) 100 100 30.3% 29.4% 28.1% 35.2% 0 0 Ecache Tcache Ecache Tcache
Cache Power Consumption Cache Size trend Effect of on-chip caches to total chip power consumption DEC 21164 CPU* StrongARM SA-110 CPU* Bipolar ECL CPU** 50% 25% 43% * Kamble, et. Al., “Analytical energy Dissipatiion Models for Low Power Caches”, ILPED’97 ** Joouppi, et. Al., “A 300-MHz 115-W 32-b Bipolar ECL Microprocessor” ,IEEE Journal of Solid-State Circuits’93
Bit line Word line Sense Amp Output driver Addr input Comparator Latche Energy Consumption Model Components of the power dissipation 32KB Direct-mapped I-Cache 32KB 4-way D-Cache Ememory=95.6% Ememory=97.7% Ghose, et. Al. : Energy Efficient Cache Organizations for Superscalar Processors, Power-Driven microarchitecture Workshop in Conjunction with ISCA’98 Average Energy Consumption for an access Energy consumed for accessing a tag-subarray Energy consumed for accessing a line-subarray Ecache ~ Ememory = NtagxEtag+NdataxEdata Ave. number of tag-subarray accessed for an access Ave. number of line-subarray accessed for an access
Experimental Analysis- Environment - Benchmarks SPECint95 099.go, 124.m88ksim, 126.gcc, 129.compress, 130.li, 132.ijpeg, 134.perl, 147.vortex SPECfp95 101.tomcatv, 102.swim, 103.su2cor, 104.hydro2d