Cache-Miss Prediction

Cache-Miss Prediction Mostly No Machine (MNM) Robert Kenney, Kai Ting, Ezra Harrington

Just Say No: Benefits of Early Cache Miss Determination • Memik, Reinman, and Mangione-Smith • 2003 HPCA

Outline • Motivation • MNM Overview • Details and Analysis of MNMs • Replacement MNM (RMNM) • Common-Address MNM (CMNM) • Simulation Environment • Simulation Results • Additional Experiments

Motivation for Cache Miss Prediction • Clock speed increases => Memory latency more harmful • Levels of cache increasing • Predicting misses results in fewer cache accesses

Our Motivation • Verify MNM results presented in paper • Miss coverage • CPI reductions • Study benefits of MNM on 3 levels of cache • Study nature of cache miss prediction

MNM Operation • Store information about current or previous cache contents • Produces a “Miss” or “Maybe” • Never produces false “Miss” • Would result in an unnecessary access to a slower cache

MNM Operation (cont) • Separate MNM for each level of cache contained in one module • No MNM for L1 • MNM accessed in parallel with L1, or • MNM accessed after miss in L1 (saves power) • MNM produces information about which caches to access

MNM Operation (cont) • When a cache level is skipped • Next “Maybe” is searched • If Miss, next “maybe” searched, etc. • Retrieved data is written to bypassed cache

Replacement MNM Operation • Contains information about previous cache contents • Cache of addresses • Address of replaced block cached in RMNM • Incoming block is invalidated in RMNM, if necessary

Common-Address MNM • Uses spatial locality of accesses to improve miss prediction. • Two-level prediction scheme • Virtual-tag finder • Virtual tag registers with masks • Table of saturating counters • Indexed by {index of VT reg | N index bits} • Counters reset to zero on cache flushes

Common-Address MNM (cont)

Common-Address MNM (cont) • On access to CMNM: • Two ways to predict a “miss” • No match in VT match in VT finder • Table entry is “000” • On update to cache and CMNM: • Masks reduced until match found • Counter incremented when data added to cache • Counter decremented when data evicted from cache

Perfect MNM • 100% coverage • Ideal performance gain obtainable by cache miss prediction

Simulation Environment • Used SimpleScalar to simulate MNMs • Modified sim-cache and sim-outorder to handle up to 5 levels of cache • Implemented three MNM modules • Recreated exact simulation environment used in paper…as best we could • Six benchmarks ran • Four integer • Two floating point

Quantifying MNM Benefits • Coverage • Misses predicted / Total misses • Cycle Savings

RMNM Results

CMNM Results

Paper Critiques • Not enough information in paper to reproduce results • Fast-forward info not stated • Latency of MNM access was uniform • RMNM invalidations with updates • CMNM update can be cumbersome and variable • Cache update latency info not stated • Little in-depth analysis of nature and characteristics of MNM • Left out cases when performance degraded

Did we match his results?

Saturation Study • Ran gcc and mesa with CMNM_8_10 and RMNM_2048_4 for varying instruction lengths (no flushes) • CMNM handles compulsory and capacity cache misses better than RMNM

Performance gain vs. MNM access latency • Measured benefits of miss prediction vs. cost of predicting for equake • DL1 hit latency: 2 cycles, DL2 hit latency: 8 cycles • PMNM: 100% coverage

Coverage vs. Cache Size & Associativity • One MNM for DL2 only • Varied properties of DL2 • MNM and DL1 remained constant • Sims run on equake • RMNM works best for low associativity (Conflict Cache Misses) • CMNM handles different associativities better

Summary • Mostly No Machine predicts cache misses • Replacement MNM • Common-Address MNM • Installation in SimpleScalar • Results • Analyses

Questions?

Cache-Miss Prediction

Cache-Miss Prediction

Presentation Transcript

Cache

Hiding Cache Miss Penalty Using Priority-based Execution for Embedded Processors

Miss

Cache

Cache

Cache

High Performance Cache Replacement Using Re-Reference Interval Prediction (RRIP )

Hiding cache miss penalty using priority based execution for embedded processors

Cache Replacement Algorithms with Nonuniform Miss Costs

Miss Reduction in Embedded Processors Through Dynamic, Power-Friendly Cache Design

Improving Data Cache Performance Under a Cache Miss

Reducing Cache Miss Penalties

INFLUENCE OF THE CACHE COHERENCE PROTOCOL ON THE MISS RATE

Benefits of Early Cache Miss Determination

Cache

High Performance Cache Replacement Using Re-Reference Interval Prediction(RRIP)

Cache?

Err cache miss error fixes

Cache Miss Analysis of Walsh-Hadamard Transform Algorithms

Cache Miss Rate Computations

Cache