270 likes | 355 Views
Balanced Cache:Reducing Conflict Misses of Direct-Mapped Caches through Programmable Decoders ISCA 2006,IEEE. By Chuanjun Zhang. Speaker: WeiZeng. Introduction The B-Cache Organization Experimental Results and Analysis Related Work Conclusion. Outline. Introduction
E N D
Balanced Cache:Reducing Conflict Misses of Direct-Mapped Caches through Programmable DecodersISCA 2006,IEEE.By Chuanjun Zhang Speaker: WeiZeng
Introduction The B-Cache Organization Experimental Results and Analysis Related Work Conclusion Outline
Introduction The B-Cache Organization Experimental Results and Analysis Related Work Conclusion Outline
Background • Bottleneck to achieve high performance • Increasing gap between memory latency andprocessor speed • Multilevel memory hierarchy • Cache acts as intermediary between the super fast processor and the much slower main memory. • Two cache mapping schemes • Direct-Mapped Cache: • Set-Associative Cache:
Comparision Desirable cache : access time of direct-mapped cache + low miss rate of set-associative cache.
What is B-Cache? Balanced Cache (B-Cache): A mechanism to provide the benefit of cacheblock replacement while maintaining the constant access time of a direct-mapped cache
New features of B-Cache • Decoder length of direct-mapped cache is increased by 3 bits: accesses to heavily used sets can be reduced to 1/8th of original design. • A replacement policy is added. • A programmable decoder is used.
The problem (an Example) 8-bit adresses 0,1,8,9... 0,1,8,9
B-Cache solution 8-bit address same as in 2-way cache X : invalid PD entry
Introduction The B-Cache Organization Experimental Results and Analysis Related Work Conclusion Outline
Terminology • Memory address mapping factor (MF): • B-Cache associativity (BAS): PI : index length of PD NPI : index length of NPD OI : index length of original direct-mapped cache MF = 2(PI+NPI)/2OI ,where MF≥1 BAS = 2OI/2NPI, where BAS≥1
B-Cache organization MF = 2(PI+NPI)/2OI =2(6+6)/29=8 BAS = 2(OI)/2NPI =2(3)/26=8
Replacement policy • Random Policy: • Simple to design and needs very few extra hardware. • Least Recently Used(LRU): • Better hit rate but more area overhead
Introduction The B-Cache Organization Experimental Results and Analysis Related Work Conclusion Outline
Experimental Methodology • Primary metric: miss rate • Other metrics: Latency,Storage,Power Costs, Overall Performance, Overall Energy • Baseline: level one cache(direct-mapped 16kB cache with 32 bytes line size for instruction and data caches) • 26 SPEC2K benchmarks run using the SimpleScalar tool set
Data miss-rate reductions 16 entry victim buffer set-associative caches B-Caches with dif. MFs
Storage Overhead • Additional hardware for the B-Cache is the CAM based PD. • 4.3% higher than baseline
Power Overhead • Extra power consumption: PD of eachsubarray. • Power reduction: • 3-bit data length reduction • Removal of 3 input NAND gates • 10.5% higher than baseline
Overall Performance • Outperforms baseline by average of 5.9%. • Only 0.3% less than 8-waycache but 3.7% higher than victim buffer.
Overall Energy • B-Cache consumes least energy ( 2%less than the baseline ) • B-Cachereduces miss rate and hence accesses to 2nd level cache, which is more power costly. • When cache miss,B-Cache also reduces cache memory accesses throughmiss prediction of PD, which makes power overhead much less.
Introduction The B-Cache Organization Experimental Results and Analysis Related Work Conclusion Outline
Related works • Reducing Miss Rate of Direct Mapped Caches • Page allocation • Column associative cache • Adaptive group associative cache • Skewed associative cache • Reducing Access Time of Set-associative Caches • Partial address matcing : predicting hit way • Difference bit cache
Compared with previous tech B-cache • Applied to both high performance and low-power embedded systems • Balanced without software intervention • Feasible and easy to implement
Introduction The B-Cache Organization Experimental Results and Analysis Related Work Conclusion Outline
Conclusion • B-Cacheallows accesses to cache sets to be balanced by increasingthe decoder length and incorporating a replacement policy toa direct-mapped cache design. • Programmable decoders dynamically determine whichmemory address has a mapping to cache set • A 16kB levelone B-Cache outperforms direct-mappedcache by 64.5% and 37.8% miss rate reductions for instruction and datacache, respectively • Average IPCimprovement:5.9% • Energy reduction:2%. • Access time:same as directmappedcache