1 / 25

Achieving Non-Inclusive Cache Performance with Inclusive Caches

Achieving Non-Inclusive Cache Performance with Inclusive Caches. MICRO 2010. Aamer Jaleel, Eric Borch, Malini Bhandaru, Simon Steely Jr., Joel Emer Intel Corporation, VSSAD. Present by Soon-Won Hong. Motivation. Factors making caching important CPU speed >> Memory speed

Download Presentation

Achieving Non-Inclusive Cache Performance with Inclusive Caches

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Achieving Non-Inclusive Cache Performancewith Inclusive Caches MICRO 2010 Aamer Jaleel, Eric Borch, Malini Bhandaru, Simon Steely Jr., Joel Emer Intel Corporation, VSSAD Present by Soon-Won Hong

  2. Motivation • Factors making caching important • CPU speed >> Memory speed • Chip Multi-Processors (CMPs) • Goal: • High performing LLC iL1 dL1 iL1 dL1 L2 L2 Last Level Cache (LLC)

  3. Motivation • Factors making caching important • CPU speed >> Memory speed • Chip Multi-Processors (CMPs) • Goal: • High performing LLC • High performing cache hierarchy iL1 dL1 iL1 dL1 L2 L2 Last Level Cache (LLC)

  4. Cache Hierarchy Core request evict L1 fill BackInval LLC fill victim memory • Inclusive Hierarchy • L1 subset of LLC

  5. Cache Hierarchy Core request Core request victim evict L1 L1 fill fill BackInval LLC LLC fill fill victim victim memory memory • Inclusive Hierarchy • L1 subset of LLC • Exclusive Hierarchy • L1 is NOT in LLC

  6. Cache Hierarchy Core request Core request Core request victim evict L1 L1 L1 fill fill fill BackInval LLC LLC LLC fill fill fill victim victim victim memory memory memory • Inclusive Hierarchy • L1 subset of LLC • Non-Inclusive Hierarchy • L1 not subset of LLC • Exclusive Hierarchy • L1 is NOT in LLC

  7. Cache Hierarchy Core request Core request Core request victim L1 L1 L1 evict fill fill fill IN A NUTSHELL BackInval LLC LLC LLC fill fill fill victim victim victim memory memory memory (+) simplify cache coherence (−) waste cache capacity (−) back-invalidates limits performance • Inclusive Hierarchy • L1 subset of LLC • Non-Inclusive Hierarchy • L1 not subset of LLC • Exclusive Hierarchy • L1 is NOT in LLC Inclusive Caches Total Capacity: LLC>= LLC and <= (L1+LLC) L1 + LLC Back-Invalidate: YESNO NO (+) do not waste cache capacity (−) complicate cache coherence (−) extra hardware for snoop filtering Non-Inclusive Caches Coherence: LLC Acts AsLLC miss snoops ALL L1$ LLC miss snoops ALL L1$ Directory (or use Snoop Filter) (or use Snoop Filter)

  8. Performance of Non-Inclusive and Exclusive LLCs AMD INTEL Baseline Inclusion (2-core CMP with 32KB L1, 256KB L2, LLC based on ratio) • Enforcing inclusion is bad when LLC is not significantly larger than MLC • Why Non-inclusive (NI) and Exclusive LLCs perform better? • Make use of extra cache capacity by avoiding duplication • Avoid problems dealing with harmful back-invalidates

  9. Back-Invalidate Problem with Inclusive Caches • Inclusion Victims:Lines evicted from core caches due to LLC eviction • Small caches filter temporal locality • Small cache hits do not update LLC LRU • “Hot” small cache lines  LRU in LLC • Example Reference Pattern: … a, b, a, c, a, d, a, e, a, f… a b a L1: a b a L2: a b c a LRU MRU b a c b a Reference ‘e’ misses and evicts ‘a’ from hierarchy Next Reference to ‘a’ misses a c d a c b a d c b a a d e d d c b a e d c b

  10. Inclusion Problem Exacerbated on CMPs! • Types of Applications: • Core Cache Fitting (CCF) Apps: working set fits in the core caches • LLC Fitting (LLCF) Apps:working set fits in the LLC • LLC Thrashing (LLCT) Apps:working set is larger than LLC iL1 dL1 iL1 dL1 L2 L2 CCF LLC LLCT LLCF

  11. Inclusion Problem Exacerbated on CMPs! iL1 dL1 iL1 dL1 L2 L2 LLC

  12. Inclusion Problem Exacerbated on CMPs! • CCF apps serviced from L2 cache and rarely from the LLC • Replacement state of CCF apps becomes LRU at LLC iL1 dL1 iL1 dL1 L2 L2 LLC

  13. Inclusion Problem Exacerbated on CMPs! • CCF apps serviced from L2 cache and rarely from the LLC • Replacement state of CCF apps becomes LRU at LLC • LLCF app replaces CCF working set from LLC • Inclusion mandates removing CCF working set from entire hierarchy iL1 dL1 iL1 dL1 L2 L2 LLC

  14. Main Idea of Temporal Locality Aware(TLA) • Temporal Locality Hints(TLH) • Early Core Invalidation(ECI) • Query Based Selection(QBS)

  15. Eliminate “Inclusion Victims” Using Temporal Hints • Baseline policies only update replacement state at level of hit • Proposal: convey temporal locality in small caches to LLC • Temporal Locality Hints: • Non-data requests sent to update LLC replacement state Core request (L1 hit) Update LRU L1 L2 LLC (TLH) Update LRU

  16. Improving Inclusive Cache Performance • Eliminate back-invalidates (i.e. build non-inclusive caches) • Increases coherence complexity • Goal: Retain benefits of inclusion yet avoid inclusion victims • Solution • Ensure LLC DOES NOT evict “hot” lines from core caches • Must identify LLC lines that have high temporal locality in core caches

  17. Early Core Invalidate (ECI) • Main Idea: Derive temporal locality by removing line early from core caches • Early Core Invalidate (ECI): • Send early invalidate for the next victim in same set • If line is “hot”, it will be “rescued” from LLC  “rescue” updates LLC replacement state as a side effect L1 L2 Back Invalidate Miss Flow L3 Early Core Invalidate e d c b a LRU MRU Memory Next Victim

  18. Query Based Selection (QBS) • Main Idea: Replace lines that are NOT resident in core caches • Query Based Selection (QBS): • LLC sends back-inval request • Core rejects back-inval if line is resident in core caches REJECT L1 L2 Back-Invalidate Request Miss Flow L3 a e d c b LRU MRU Memory

  19. Query Based Selection (QBS) • Main Idea: Replace lines that are NOT resident in core caches • Query Based Selection (QBS): • LLC sends back-inval request • Core rejects back-inval if line is resident in core caches • If core rejects, update to MRU in LLC • LLC repeats back-inval process till core accepts back-inval request (or timeout) ACCEPT L1 L2 Back-Invalidate Request Miss Flow L3 b a e d c LRU MRU Memory

  20. Example of TLH, ECI and QBS

  21. Performance of L1 Temporal Locality Hints • L1 hints decrease 85% of gap between inclusion & non-inclusion • Limitations of L1 Hints: • Very high BW • num messages = num L1 hits 2T Workloads on a 1:4 Hierarchy 5.2% 6.1% Baseline Inclusion • *Our studies do not model TLH BW

  22. Performance of Early Core Invalidate (ECI) • ECI decrease 55% of gap between inclusion & non-inclusion • Pros: • No HW overhead, Low BW • num messages = num LLC misses • Limitations: • Short time to rescue. Rescue must occur BEFORE next miss to set 2T Workloads on a 1:4 Hierarchy 3.4% 6.1% Baseline Inclusion

  23. Performance of Query Based Selection (QBS) • QBS outperforms non-inclusion • Pros: • No HW overhead, Low BW • num messages = num LLC misses • No considering of time to rescue. 2T Workloads on a 1:4 Hierarchy 6.6% 6.1% Baseline Inclusion

  24. Summary of TLA Cache Management (2-core CMP) Baseline Inclusion

  25. Summary • Problem:Inclusive cache problem becomes WORSE on CMPs • E.g. Core Cache fitting + LLC Fitting/Thrashing • Conventional Wisdom: Primary benefit of non-inclusive cache is because of higher capacity • We show:primary benefit NOT capacity but avoiding back-invalidates • Proposal: Temporal Locality Aware Cache Management • Retains benefit of inclusion while minimizing back-invalidate problem • TLA managed inclusive cache = performance of non-inclusive cache

More Related