230 likes | 321 Views
Euro-Par 2009, Delft (The Netherlands) - August 27, 2009. Last Bank: Dealing with Address Reuse in Non-Uniform Cache Architecture for CMPs. Javier Lira ψ Carlos Molina ф Antonio González λ. ф Dept. Enginyeria Informàtica Universitat Rovira i Virgili Tarragona, Spain carlos.molina@urv.net.
E N D
Euro-Par 2009, Delft (The Netherlands) - August 27, 2009 Last Bank: Dealing with Address Reuse inNon-Uniform Cache Architecture for CMPs Javier Liraψ Carlos Molinaф Antonio Gonzálezλ фDept. Enginyeria Informàtica Universitat Rovira i Virgili Tarragona, Spaincarlos.molina@urv.net ψDept. Arquitectura de Computadors Universitat Politècnica de Catalunya Barcelona, Spain javier.lira@ac.upc.edu λ Intel Barcelona Research Center Intel Labs - UPC Barcelona, Spainantonio.gonzalez@intel.com
Outline • Introduction • Methodology • Last Bank • Characterization of replacements in NUCA • Last Bank Optimizations • Conclusions
Introduction • CMPs have emerged as a dominant paradigm in system design. • Keep performance improvement while reducing power consumption. • Take advantage of Thread-level parallelism. • Commercial CMPs are currently available. • CMPs incorporate larger and shared last-level caches. • Wire delay is a key constraint.
NUCA • Non-Uniform Cache Architecture (NUCA) was first proposed in ASPLOS 2002 by Kim et al.[1]. • NUCA divides a large cache in smaller and faster banks. • Banks close to cache controller have smaller latencies than further banks. Processor [1] C. Kim, D. Burger and S.W. Keckler. An Adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. ASPLOS ‘02
Outline • Introduction • Methodology • Last Bank • Characterization of replacements in NUCA • Last Bank Optimizations • Conclusions
Methodology • Simulation tools: • Simics + GEMS • CACTI v6.0 • PARSEC Benchmark Suite
Baseline NUCA cache architecture 8 cores 256 banks [2] B. M. Beckmann and D. A. Wood. Managing wire delay in large chip-multiprocessor caches. MICRO ‘04
Outline • Introduction • Methodology • Last Bank • Characterization of replacements in NUCA • Last Bank Optimizations • Conclusions
Last Bank • Data movements concentrate most accessed data in few banks. • Data replacements in HOT banks are unfair.
Last Bank • An extra bank is included in the NUCA cache. • Acts as a Victim cache, but it is not fully-associative. • Provides evicted data a second chance for keeping in the NUCA. Last Bank
Last Bank • Performance benefits restricted by Last Bank size. • Significant performance potential. • Analysis of reused addresses to find improvement points.
Outline • Introduction • Methodology • Last Bank • Characterization of replacements in NUCA • Last Bank Optimizations • Conclusions
Characterization of replacements in NUCA • How many evicted addresses are later reused? • How many cycles do a reused address usually spend out of the NUCA before being reinserted? • Where were reused addresses located within the NUCA just before being evicted? • What action did motivate reused addresses eviction from NUCA?
Reused address statistics • Nearly 70% of evicted addresses return to the NUCA cache. • Most of the reused address, return to NUCA at least twice.
Time between Eviction and Reinsertion • Nearly 30% of evicted addresses return in less than 100,000 cycles. • In blackscholes, almost 50% of reused addresses return to NUCA in less than 1,000 cycles.
Last location within the NUCA • Most of reused addresses were evicted from Local Banks. • Most of addresses replaced from Central Banks are not later reused.
Outline • Introduction • Methodology • Last Bank • Characterization of replacements in NUCA • Last Bank Optimizations • Conclusions
Selective Last Bank • Target: To reduce pollution in Last Bank. • This mechanism allows to select the evicted data blocks that are going to be stored in the Last Bank. • Implemented Selective Last Bank: • Stores data blocks, if and only if, they were evicted from a Local Bank. • Otherwise, sends them back to the main memory.
LRU Prioritising Last Bank • Target: To maintain reused addresses in the NUCA cache. • Modification of data eviction policy of NUCA banks. • Prioritises lines that come from Last Bank during the data replacement process. @D, P:0 @B, P:0 @A, P:0 @C, P:0 MRU LRU
Results • BothoptimizationsincreaseLast Bank performance benefits. • Thereisstillroomforimprovement. • Adaptivefilterswillbeanalysed in futureworks.
Outline • Introduction • Methodology • Last Bank • Characterization of replacements in NUCA • Last Bank Optimizations • Conclusions
Conclusions • Data movements provoke unfair replacements in HOT banks. • Last Bank reduce access latency of promptly reused addresses. • Huge performance potential. • Two optimizations are proposed: • Selective Last Bank: Reduce pollution in Last Bank. • LRU Prioritising Last Bank: Maintain reused addresses in the NUCA cache.
Last Bank: Dealing with Address Reuse inNon-Uniform Cache Architecture for CMPs Questions?