240 likes | 386 Views
ICCD 2009, Lake Tahoe , CA (USA) - October 6, 2009. LRU-PEA: A Smart Replacement Policy for NUCA caches on Chip Multiprocessors. Javier Lira ψ Carlos Molina ψ, ф Antonio González ψ,λ. ф Dept. Enginyeria Informàtica Universitat Rovira i Virgili Tarragona, Spain carlos.molina@urv.net.
E N D
ICCD 2009, Lake Tahoe, CA (USA) - October 6, 2009 LRU-PEA: A Smart Replacement Policy for NUCA caches on Chip Multiprocessors Javier Liraψ Carlos Molinaψ,ф Antonio Gonzálezψ,λ фDept. Enginyeria Informàtica Universitat Rovira i Virgili Tarragona, Spaincarlos.molina@urv.net ψDept. Arquitectura de Computadors Universitat Politècnica de Catalunya Barcelona, Spain javier.lira@ac.upc.edu λ Intel Barcelona Research Center Intel Labs - UPC Barcelona, Spainantonio.gonzalez@intel.com
Outline • Introduction • Methodology • LRU-PEA • Results • Conclusions
Introduction • CMPs have emerged as a dominant paradigm in system design. • Keep performance improvement while reducing power consumption. • Take advantage of Thread-level parallelism. • Commercial CMPs are currently available. • CMPs incorporate larger and shared last-level caches. • Wire delay is a key constraint.
NUCA • Non-Uniform Cache Architecture (NUCA) was first proposed in ASPLOS 2002 by Kim et al.[1]. • NUCA divides a large cache in smaller and faster banks. • Banks close to cache controller have smaller latencies than further banks. Processor [1] C. Kim, D. Burger and S.W. Keckler. An Adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. ASPLOS ‘02
NUCA Policies Bank Placement Policy Bank Access Policy Bank Migration Policy Bank Replacement Policy
Outline • Introduction • Methodology • LRU-PEA • Results • Conclusions
Methodology • Simulation tools: • Simics + GEMS • CACTI v6.0 • PARSEC Benchmark Suite
Baseline NUCA cache architecture CMP-DNUCA 8 cores 256 banks Non-inclusive [2] B. M. Beckmann and D. A. Wood. Managing wire delay in large chip-multiprocessor caches. MICRO ‘04
Outline • Introduction • Methodology • LRU-PEA • Background • How does it work? • Results • Conclusions
Background • Entranceintothe NUCA • Off-chip memory • L1 cache replacements • Migrationmovements • Promotion • Demotion
Data categories Off-chip L1 cache Replacements Promoted data Demoted data
LRU-PEA • LRU withPriorityEvictionApproach • Replacementpolicyfor CMP-NUCA architectures. • Data EvictionPolicy: • Chooses data toevictfrom a NUCA bank. • Data Target Policy: • Determines thedestinationbank of theevicted data. • Globalizesreplacementdecisionstothewhole NUCA.
Data EvictionPolicy • Basedonthe LRU replacementpolicy. • Staticprioritisation of NUCA data categories. • Lowest-category data isevictedfromthe NUCA bank. • PROBLEM: • Highest-categorycouldmonopolizethe NUCA cache. • Categorycomparissonisrestrictedtothe LRU and the LRU-1 positions.
Data EvictionPolicy • Example (NUCA bank, 4-way)**: LRU-PEA @A Promoted @B Demoted @C Offchip @C Offchip @D Promoted @D Promoted @D Promoted Available 0 1 2 3 MRU LRU ** The set associativityassumed in thisworkfor NUCA banksis 8-way.
Data Target Policy • Migrationmovementsprovokebankusageimbalance in the NUCA cache. • Replacements in mostaccessedbanks are unfair. • LRU-PEA globalizesreplacementdecisionstoevictthemostappropriate data fromthe NUCA cache.
Data Target Policy • Example (256 NUCA Banks, 16 possibleplacements): Cascademode Vs. …
Outline • Introduction • Methodology • LRU-PEA • Results • Conclusions
Outline • Introduction • Methodology • LRU-PEA • Results • Conclusions
Conclusions • LRU-PEA isproposed as analternativetothetraditional LRU replacementpolicy in CMP-NUCA architectures. • Defines four novel NUCA categories and prioritisesthemtofindthemostappropriate data toevict. • In a D-NUCA architecture, data movementsprovokeunfairreplacements in mostaccessedbanks. • LRU-PEA globalizesreplacementdecisionstaken in a single banktothewhole NUCA cache. • LRU-PEA reduces miss rate, increases performance withparallelapplications, reduces energyconsumed per instruction, comparedtothetraditional LRU policy.
LRU-PEA: A Smart Replacement Policy for NUCA caches on Chip Multiprocessors Questions?