1 / 27

Speaker: WeiZeng

Balanced Cache:Reducing Conflict Misses of Direct-Mapped Caches through Programmable Decoders ISCA 2006,IEEE. By Chuanjun Zhang. Speaker: WeiZeng. Introduction The B-Cache Organization Experimental Results and Analysis Related Work Conclusion. Outline. Introduction

blithe
Download Presentation

Speaker: WeiZeng

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Balanced Cache:Reducing Conflict Misses of Direct-Mapped Caches through Programmable DecodersISCA 2006,IEEE.By Chuanjun Zhang Speaker: WeiZeng

  2. Introduction The B-Cache Organization Experimental Results and Analysis Related Work Conclusion Outline

  3. Introduction The B-Cache Organization Experimental Results and Analysis Related Work Conclusion Outline

  4. Background • Bottleneck to achieve high performance • Increasing gap between memory latency andprocessor speed • Multilevel memory hierarchy • Cache acts as intermediary between the super fast processor and the much slower main memory. • Two cache mapping schemes • Direct-Mapped Cache: • Set-Associative Cache:

  5. Comparision Desirable cache : access time of direct-mapped cache + low miss rate of set-associative cache.

  6. What is B-Cache? Balanced Cache (B-Cache): A mechanism to provide the benefit of cacheblock replacement while maintaining the constant access time of a direct-mapped cache

  7. New features of B-Cache • Decoder length of direct-mapped cache is increased by 3 bits: accesses to heavily used sets can be reduced to 1/8th of original design. • A replacement policy is added. • A programmable decoder is used.

  8. The problem (an Example) 8-bit adresses 0,1,8,9... 0,1,8,9

  9. B-Cache solution 8-bit address same as in 2-way cache X : invalid PD entry

  10. Introduction The B-Cache Organization Experimental Results and Analysis Related Work Conclusion Outline

  11. Terminology • Memory address mapping factor (MF): • B-Cache associativity (BAS): PI : index length of PD NPI : index length of NPD OI : index length of original direct-mapped cache MF = 2(PI+NPI)/2OI ,where MF≥1 BAS = 2OI/2NPI, where BAS≥1

  12. B-Cache organization MF = 2(PI+NPI)/2OI =2(6+6)/29=8 BAS = 2(OI)/2NPI =2(3)/26=8

  13. Replacement policy • Random Policy: • Simple to design and needs very few extra hardware. • Least Recently Used(LRU): • Better hit rate but more area overhead

  14. Introduction The B-Cache Organization Experimental Results and Analysis Related Work Conclusion Outline

  15. Experimental Methodology • Primary metric: miss rate • Other metrics: Latency,Storage,Power Costs, Overall Performance, Overall Energy • Baseline: level one cache(direct-mapped 16kB cache with 32 bytes line size for instruction and data caches) • 26 SPEC2K benchmarks run using the SimpleScalar tool set

  16. Data miss-rate reductions 16 entry victim buffer set-associative caches B-Caches with dif. MFs

  17. Latency

  18. Storage Overhead • Additional hardware for the B-Cache is the CAM based PD. • 4.3% higher than baseline

  19. Power Overhead • Extra power consumption: PD of eachsubarray. • Power reduction: • 3-bit data length reduction • Removal of 3 input NAND gates • 10.5% higher than baseline

  20. Overall Performance • Outperforms baseline by average of 5.9%. • Only 0.3% less than 8-waycache but 3.7% higher than victim buffer.

  21. Overall Energy • B-Cache consumes least energy ( 2%less than the baseline ) • B-Cachereduces miss rate and hence accesses to 2nd level cache, which is more power costly. • When cache miss,B-Cache also reduces cache memory accesses throughmiss prediction of PD, which makes power overhead much less.

  22. Introduction The B-Cache Organization Experimental Results and Analysis Related Work Conclusion Outline

  23. Related works • Reducing Miss Rate of Direct Mapped Caches • Page allocation • Column associative cache • Adaptive group associative cache • Skewed associative cache • Reducing Access Time of Set-associative Caches • Partial address matcing : predicting hit way • Difference bit cache

  24. Compared with previous tech B-cache • Applied to both high performance and low-power embedded systems • Balanced without software intervention • Feasible and easy to implement

  25. Introduction The B-Cache Organization Experimental Results and Analysis Related Work Conclusion Outline

  26. Conclusion • B-Cacheallows accesses to cache sets to be balanced by increasingthe decoder length and incorporating a replacement policy toa direct-mapped cache design. • Programmable decoders dynamically determine whichmemory address has a mapping to cache set • A 16kB levelone B-Cache outperforms direct-mappedcache by 64.5% and 37.8% miss rate reductions for instruction and datacache, respectively • Average IPCimprovement:5.9% • Energy reduction:2%. • Access time:same as directmappedcache

  27. Thanks!

More Related