1 / 24

LRU-PEA: A Smart Replacement Policy for NUCA caches on Chip Multiprocessors

ICCD 2009, Lake Tahoe , CA (USA) - October 6, 2009. LRU-PEA: A Smart Replacement Policy for NUCA caches on Chip Multiprocessors. Javier Lira ψ Carlos Molina ψ, ф Antonio González ψ,λ. ф Dept. Enginyeria Informàtica Universitat Rovira i Virgili Tarragona, Spain carlos.molina@urv.net.

valiant
Download Presentation

LRU-PEA: A Smart Replacement Policy for NUCA caches on Chip Multiprocessors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ICCD 2009, Lake Tahoe, CA (USA) - October 6, 2009 LRU-PEA: A Smart Replacement Policy for NUCA caches on Chip Multiprocessors Javier Liraψ Carlos Molinaψ,ф Antonio Gonzálezψ,λ фDept. Enginyeria Informàtica Universitat Rovira i Virgili Tarragona, Spaincarlos.molina@urv.net ψDept. Arquitectura de Computadors Universitat Politècnica de Catalunya Barcelona, Spain javier.lira@ac.upc.edu λ Intel Barcelona Research Center Intel Labs - UPC Barcelona, Spainantonio.gonzalez@intel.com

  2. Outline • Introduction • Methodology • LRU-PEA • Results • Conclusions

  3. Introduction • CMPs have emerged as a dominant paradigm in system design. • Keep performance improvement while reducing power consumption. • Take advantage of Thread-level parallelism. • Commercial CMPs are currently available. • CMPs incorporate larger and shared last-level caches. • Wire delay is a key constraint.

  4. NUCA • Non-Uniform Cache Architecture (NUCA) was first proposed in ASPLOS 2002 by Kim et al.[1]. • NUCA divides a large cache in smaller and faster banks. • Banks close to cache controller have smaller latencies than further banks. Processor [1] C. Kim, D. Burger and S.W. Keckler. An Adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. ASPLOS ‘02

  5. NUCA Policies Bank Placement Policy Bank Access Policy Bank Migration Policy Bank Replacement Policy

  6. Outline • Introduction • Methodology • LRU-PEA • Results • Conclusions

  7. Methodology • Simulation tools: • Simics + GEMS • CACTI v6.0 • PARSEC Benchmark Suite

  8. Baseline NUCA cache architecture CMP-DNUCA 8 cores 256 banks Non-inclusive [2] B. M. Beckmann and D. A. Wood. Managing wire delay in large chip-multiprocessor caches. MICRO ‘04

  9. Outline • Introduction • Methodology • LRU-PEA • Background • How does it work? • Results • Conclusions

  10. Background • Entranceintothe NUCA • Off-chip memory • L1 cache replacements • Migrationmovements • Promotion • Demotion

  11. Data categories Off-chip L1 cache Replacements Promoted data Demoted data

  12. LRU-PEA • LRU withPriorityEvictionApproach • Replacementpolicyfor CMP-NUCA architectures. • Data EvictionPolicy: • Chooses data toevictfrom a NUCA bank. • Data Target Policy: • Determines thedestinationbank of theevicted data. • Globalizesreplacementdecisionstothewhole NUCA.

  13. Data EvictionPolicy • Basedonthe LRU replacementpolicy. • Staticprioritisation of NUCA data categories. • Lowest-category data isevictedfromthe NUCA bank. • PROBLEM: • Highest-categorycouldmonopolizethe NUCA cache. • Categorycomparissonisrestrictedtothe LRU and the LRU-1 positions.

  14. Data EvictionPolicy • Example (NUCA bank, 4-way)**: LRU-PEA @A Promoted @B Demoted @C Offchip @C Offchip @D Promoted @D Promoted @D Promoted Available 0 1 2 3 MRU LRU ** The set associativityassumed in thisworkfor NUCA banksis 8-way.

  15. Data Target Policy • Migrationmovementsprovokebankusageimbalance in the NUCA cache. • Replacements in mostaccessedbanks are unfair. • LRU-PEA globalizesreplacementdecisionstoevictthemostappropriate data fromthe NUCA cache.

  16. Data Target Policy • Example (256 NUCA Banks, 16 possibleplacements): Cascademode Vs. …

  17. Outline • Introduction • Methodology • LRU-PEA • Results • Conclusions

  18. Increasingnetworkcongestion

  19. NUCA miss rateanalysis

  20. Performance analysis

  21. Dynamic EPI analysis

  22. Outline • Introduction • Methodology • LRU-PEA • Results • Conclusions

  23. Conclusions • LRU-PEA isproposed as analternativetothetraditional LRU replacementpolicy in CMP-NUCA architectures. • Defines four novel NUCA categories and prioritisesthemtofindthemostappropriate data toevict. • In a D-NUCA architecture, data movementsprovokeunfairreplacements in mostaccessedbanks. • LRU-PEA globalizesreplacementdecisionstaken in a single banktothewhole NUCA cache. • LRU-PEA reduces miss rate, increases performance withparallelapplications, reduces energyconsumed per instruction, comparedtothetraditional LRU policy.

  24. LRU-PEA: A Smart Replacement Policy for NUCA caches on Chip Multiprocessors Questions?

More Related