1 / 16

ISLPED’99 International Symposium on Low Power Electronics and Design

ISLPED’99 International Symposium on Low Power Electronics and Design. Koji Inoue, Tohru Ishihara, and Kazuaki Murakami. Way-Predicting Set-Associative Cache for High Performance and Low Energy Consumption. Department of Computer Science and Communication Engineering Kyushu University.

radley
Download Presentation

ISLPED’99 International Symposium on Low Power Electronics and Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ISLPED’99International Symposium on Low Power Electronics and Design Koji Inoue, Tohru Ishihara, and Kazuaki Murakami Way-Predicting Set-Associative Cache for High Performance and Low Energy Consumption Department of Computer Science and Communication Engineering Kyushu University ppram@c.csce.kyushu-u.ac.jp

  2. Conventional 4-Way Set-Associative Cache Tag subarray Cache-line subarray Way 0 Way 1 Way 2 Way 3 Step1. Address Decode Decode circuit Step2.Read out of a tag and a line from each way Activate of word line Activate senseamps pre(dis)charge bit lines Total energy for an access for I/O pin drive for decode Ecache = Edecode + Ememory + Eio Step3. Tag comparison for SRAM access Hit Miss Step4.Provide the required data Step4.Cache replacement Activate of I/O pins

  3. Phased 4-Way Set-Associative Cache for Low Energy Consumption Energy consumption improvement by sacrificing the performance Step1. Address Decode Step2.Read out of only tags Cycle 1 Step3. Tag comparison Miss Hit Step4. Cache replacement Cycle 2 Step4.Read out of only the desired line Step5.Provide the required data

  4. Way-Predicting Set-Associative Cache - Concept - How can we achieve high-performance and low energy consumption at the same time? Fast access by reading out both of tag and line simultaneously Conventional : Good! Phased : Bad! Low energy by avoiding unnecessary line read access Conventional : Bad! Phased : Good! Predict which way has the data desired by the processor before the cache access is started

  5. 4Way-Predicting Set-Associative Cache - Operation - Step0.Way prediction Way Prediction (Cache-line Base MRU Algorithm) Step1. Address decode Step2.Read out the predicted tag and line Cycle 1 Step3. Tag comparison Miss Prediction Hit Step4.Read out the remaining tags and lines Step4.End Step5. Tag comparison Cycle 2 Prediction Miss Cache Miss Step6.End Step6.Cache replacement

  6. 4Way-Predicting Set-Associative Cache- Organization - MRU Algorithm

  7. Evaluation Environment Cache Models • Conventional 4-way Set-Associative Cache (4SACache) • Phased 4-way Set-Associative Cache (P4SACache) • Way-Predicting 4-way Set-Associative Cache (WP4SACache) Cache Size : 16 K Byte, Cache-line Size : 32 Byte, Replacement Algorithm : LRU Evaluation Items Performance(Tcache): average number of clock cycles for an access Energy (Ecache): average energy consumption for an access Energy consumed for accessing a tag-subarray Energy consumed for accessing a line-subarray Ecache ~ Ememory = NtagxEtag+NdataxEdata Ave. number of tag-subarray accessed for an access Ave. number of line-subarray accessed for an access

  8. Static Analysis- Energy and Performance Expression - P4SACache 4SACache E4SACache EP4SACache 4Etag + 4 Edata 4Etag + Edata x CHR T4SACache TP4SACache 1 1 + 1x CHR EWP4SACache WP4SACache (Etag + Edata) + (3 Etag + 3 Edata) x (1 - PHR) TWP4SACache CHR:Cache Hit Rate PHR:Prediction Hit Rate 1 + 1 x (1 - PHR)

  9. Static Analysis- Best and Worst Case - 4SACache (Conventional) P4SACache (Phased) WP4SACache (Ours) Energy Consumption (Etag = 0.078Edata) Performance Compare with Conventional (4SACache) Best Case (PHR = 100%) : 75% energy improvement without any performance degradation Worst Case (PHR = 0%) : 100% performance overhead without any energy improvement

  10. Experimental Analysis- Prediction Hit Rate -

  11. Experimental Analysis- Result of Instruction Cache - 4SACache = 1.0 P4SACache WP4SACache (Our approach) Normalized Tcache Normalized Ecache

  12. Experimental Analysis- Result of Data Cache - 4SACache = 1.0 P4SACache WP4SACache (Our approach) Normalized Tcache Normalized Ecache

  13. Conventional (4SACache) Phased (P4SACache) Way-Predicting (WP4SACache) Experimental Analysis- Energy and Performance - Average of all benchmarks 195.8% 199.4% 200 200 113.0% I-Cache D-Cache 104.1% Normalized Results (%) Normalized Results (%) 100 100 30.3% 29.4% 28.1% 35.2% 0 0 Ecache Tcache Ecache Tcache

  14. Cache Power Consumption Cache Size trend Effect of on-chip caches to total chip power consumption DEC 21164 CPU* StrongARM SA-110 CPU* Bipolar ECL CPU** 50% 25% 43% * Kamble, et. Al., “Analytical energy Dissipatiion Models for Low Power Caches”, ILPED’97 ** Joouppi, et. Al., “A 300-MHz 115-W 32-b Bipolar ECL Microprocessor” ,IEEE Journal of Solid-State Circuits’93

  15. Bit line Word line Sense Amp Output driver Addr input Comparator Latche Energy Consumption Model Components of the power dissipation 32KB Direct-mapped I-Cache 32KB 4-way D-Cache Ememory=95.6% Ememory=97.7% Ghose, et. Al. : Energy Efficient Cache Organizations for Superscalar Processors, Power-Driven microarchitecture Workshop in Conjunction with ISCA’98 Average Energy Consumption for an access Energy consumed for accessing a tag-subarray Energy consumed for accessing a line-subarray Ecache ~ Ememory = NtagxEtag+NdataxEdata Ave. number of tag-subarray accessed for an access Ave. number of line-subarray accessed for an access

  16. Experimental Analysis- Environment - Benchmarks SPECint95 099.go, 124.m88ksim, 126.gcc, 129.compress, 130.li, 132.ijpeg, 134.perl, 147.vortex SPECfp95 101.tomcatv, 102.swim, 103.su2cor, 104.hydro2d

More Related