170 likes | 281 Views
ReCast: Boosting L2 Tag Line Buffer Coverage “for Free”. www.eecg.toronto.edu/aenao. Won-Ho Park, Toronto Andreas Moshovos, Toronto Babak Falsafi, CMU. Power-Aware High-Level Caches. AENAO target: High-Performance & Power-Aware Memory Hierarchies
E N D
ReCast: Boosting L2 Tag Line Buffer Coverage “for Free” www.eecg.toronto.edu/aenao Won-Ho Park, Toronto Andreas Moshovos, Toronto Babak Falsafi, CMU
Power-Aware High-Level Caches • AENAO target: High-Performance & Power-Aware Memory Hierarchies • Much work on L1 / Our focus is on L2 power • Much opportunity at L2 and higher caches • L2 power will increase • Absolute: • L2 size and associativity will increase • Application footprint • L1 size is latency limited • Relative: • As L1 and core is optimized
L1I L1D L1I L1D tag data tag data tag data tag data f() L2 tag ReCast data L2 tag data ReCast: Caching a Few Tag Sets • Revisit “line buffer” concept for L2 • Increase Coverage via S-Shift • 50% up from 32% for conventional indexing • L2 Tag Power Savings • 38% for writeback L1D / 85% for writethrough L1D S-Shift Conventional w/ ReCast
Roadmap • ReCast Concept and Organization • S-Shift indexing / Trade-offs • Experimental Results
2 1 ? ReCast Concept Address from L1 tag set offset ReCast set index tag0 tag1 tag7 #entries ReCast Hit ReCast Miss L2 Tags L2 Hit/Miss
ReCast Power Tradeoffs • ReCast Hit • Entry determines L2 cache hit or miss • No need to access L2 tags: Power Reduced • Latency can be reduced • ReCast Miss • Need to access the L2 tags • Power Increased by ReCast overhead • Latency is increased • A win for typical applications
ReCast Organization • Distributed over the tag arrays L2 tagsubarray L2 tagsubarray recast recast address recast recast L2 tagsubarray L2 tagsubarray
Increasing L2 Set Locality • Goal: • Make consecutive L1 blocks map onto the same L2 set • Exploit Spatial Locality • Larger L2 block: won’t work • Change the L2 indexing function: S-Shift Tag Set S offset Block Address S offset New Tag New Set Affects L2 Hit Rate – Net Win for Most Applications
How S-Shift Increases Locality • Steam of sequential references, e.g., a[i++] Conventional Indexing w/ 1-Shift way 0 way 1 way 0 way 1 set 0 set 0 set 1 set 1 set n set n Not the same as increasing L2 block size May increase or decrease set pressure/ L2 miss rate
Experimental Results • Filter Rates • How often we find the set in ReCast • L2 Miss Rate • L2 Power Savings • More in the paper • Performance with various latency models • Fixed or variable latency
Methodology • SPEC CPU 2000 (some) • Up to 30 Billion Committed Instructions • 8-way OOO core • Up to 128 in-flight instructions • L1: 32K, 32-byte blocks, 2-way SA • L2: 1M, 64-byte blocks, 8-way SA • L3: 4M, 128-byte blocks, 8-way SA • Recast Organization shown: 8 banks, each 4 sets, 2-way SA
ReCast Filter Rate • 1-Shift Increases Filter Rate from 32% to 50% • 2-Shift Increases Filter Rate further… better
L2 Miss Rate • Mostly unchanged / but varies for some programs • Application analysis in the paper better
L2 Power Savings: Writeback L1D • L2 tag power reduced by 38% • Overall L2 power reduced by 16% better
L2 Power Savings: Writethrough L1D • L2 tag power reduced by 85%
ReCast • Revisited the concept of “Line Buffers” for L2 • L2 power increasingly important • In Absolute and Relative Terms • ReCast: • An L2 Tag Set Cache • S-Shift: • Improves L2 Set Locality “for free” • Results • 1-Shift Filter Rate: 50% • L2 Tag Power Savings: 38%
ReCast: L2 Power and Latency ReCast L2 Power Latency Hit Hit Hit Miss Miss Miss Miss Hit • ReCast Hit: Set in ReCast • L2 Hit: Data in L2 Reduces Power on Misses and Hits Needs Set Locality