270 likes | 486 Views
Exploiting Spatial Locality in Data Caches using Spatial Footprints. Sanjeev Kumar, Princeton University Christopher Wilkerson, MRL, Intel. Memory. Spatial Locality in Caches. Current approach: Exploit spatial locality within a cache line Small cache line Lower bandwidth Less pollution
E N D
Exploiting Spatial Locality in Data Caches using Spatial Footprints Sanjeev Kumar, Princeton University Christopher Wilkerson, MRL, Intel
Memory Spatial Locality in Caches • Current approach: Exploit spatial locality within a cache line • Small cache line • Lower bandwidth • Less pollution • Big cache line • Exploit more locality • Fewer tags • Current caches use 32 byte lines Access { Cache Line Princeton University
Spatial Locality in Caches contd. • Spatial locality exhibited varies • Across applications • Within an application • Using fixed size line to exploit locality • Inefficient use of cache resources • Less than half the data fetched is used • Wastes bandwidth, Pollutes cache • Limited benefit of spatial locality Princeton University
Outline • Introduction • Spatial Footprint Predictors • Practical Considerations • Future Work • Related Work • Conclusions Princeton University
Access 0100101100100010 Memory Spatial Footprint Predictor (SFP) • Exploit more spatial locality • Need to reduce pollution • Fetch words selectively • Requires accurately predicting spatial footprint of a block • Bit Vector • 0100 1011 0010 0010 Princeton University
Record spatial footprints Use footprint history to make predictions Lookup table based on Nominating data address Nominating instruction address Combination L1 data caches Nominating Access (NA) Memory Spatial Footprint Predictor contd. 0100101100100010 Princeton University
Use large cache lines Fetch specific words leave holes for words that were not fetched Might decrease bandwidth Increases miss ratio missed lines under-utilization of cache Cache Line Cache Memory Simple Approach Princeton University
Line { Sector Cache Memory Our Approach • Regular cache with small lines • 8 bytes i.e. one word • Exploit spatial locality at sector granularity • 16 lines i.e. 128 bytes • Spatial Footprint Predictor • Fetch 1-16 lines in a sector on a miss Princeton University
When to Record/Predict footprints? • Sectors in memory are active or inactive • Active • Record footprints • cache miss in an inactive sector • Inactive • Use history to predict • Cache miss on a line that is marked used (footprint) in an active sector • Use infinite size tables Princeton University
Access Memory Recording Footprints SF NA SF 1001 ... 1000 ... 0100101100100010 0100 ... Done 6 0100 ... Active Sector Table Spatial Footprint History Table Cache (Records FP) (Stores FP) Princeton University
Memory Predicting Footprints SF Fetch Lines 1001 ... 1000 ... 0100101100100010 0100 ... Predicted Footprint Access Spatial Footprint History Table Cache Princeton University
The default footprint predictor • When SFP has no prediction • No history • Evicted from Spatial Footprint Table • Picks a single line size for the application • Based on the footprints observed Princeton University
Cache Parameters 16 KB L1 4-way associative 8 bytes per line 16 lines per sector Cache simulator Miss Ratios Fetch bandwidth 12 Intel MRL traces gcc and go (SPEC) Transaction processing Web server PC applications word processors spreadsheets Normalized results 16KB conventional cache with 32 byte line Experimental Setup Princeton University
Experimental Evaluation Normalized Miss Ratios Princeton University
GCC Comparison Princeton University
GCC Comparison Contd. • Comparing SFP cache to • Conventional caches with varying line sizes • Comparable to best miss ratio (using 64 byte lines) • Close to lowest bandwidth (using 8 byte lines) • Bigger conventional cache • Comparable to a 32 KB Cache Princeton University
Outline • Introduction • Spatial Footprint Predictors • Practical Considerations • Future Work • Related Work • Conclusions Princeton University
Decoupled Sectored Cache • Seznec et. al. • Proposed to improve sectored L2 cache • Decouple tag array from data array • Dynamic mapping: no longer one-to-one • Multiple lines from the same sector share tags • Flexible: Data and tag array can be of different sizes and associativities Princeton University
Practical Considerations • Reasonable Spatial Footprint History Table • 1024 entries • Reduce Tag Storage • Use Decoupled Sectored Cache • Same number of tags as a conventional cache with 32 byte lines • Both data and tag array are 4-way associative Princeton University
Experimental Evaluation Normalized Miss Ratios Princeton University
Cost • Additional Space • 9 KB • Can be reduced by • Using partial tags • Compressing footprints • Time • Most predictor actions are off the critical path • Little impact on cache access latency Princeton University
Future Work • Improve miss ratios further • Infinite tables: 30% • Practical Implementation: 18% • Reduce the additional memory required • Better coarse grained predictor • L2 Caches Princeton University
Related Work • Przybylski et. al., Smith et. al. • Statically best line size, fetch size • Gonzalez et. al. • Dual data cache: temporal and spatial locality • Numeric codes • Johnson et. al. • Dynamically pick line size per block (1 KB) Princeton University
Conclusions • Spatial Footprint Predictors • Decrease miss ratio (18% on average) • Reduce bandwidth usage • Little impact on cache access latency • Can use fine-grain predictor for data caches Princeton University