1 / 18

Exploiting Load Latency Tolerance for Relaxing Cache Design Constraints

Exploiting Load Latency Tolerance for Relaxing Cache Design Constraints. Ramu Pyreddy, Gary Tyson Advanced Computer Architecture Laboratory University of Michigan. Motivation. Increasing Memory – Processor frequency Gap Large Data Caches to hide Long Latencies

gladys
Download Presentation

Exploiting Load Latency Tolerance for Relaxing Cache Design Constraints

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploiting Load Latency Tolerance for Relaxing Cache Design Constraints Ramu Pyreddy, Gary Tyson Advanced Computer Architecture Laboratory University of Michigan

  2. Motivation • Increasing Memory – Processor frequency Gap • Large Data Caches to hide Long Latencies • Larger caches – Longer Access Latencies [McFarland 98] • Processor Cycle determines Cache Size • Intel Pentium III – 16K DL1 Cache, 3 cycle access • Intel Pentium 4 – 8K DL1 Cache, 2 cycle access • Need Large AND Fast Caches!

  3. Related Work • Load Latency Tolerance [Srinivasan & Lebeck, MICRO 98] • All Loads are NOT equal • Determining Criticality – Very Complex • Sophisticated Simulator with Rollback • Non-Critical Buffer [Fisk & Bahar, ICCD99] • Determining Criticality – Performance Degradation/Dependency Chains • Non-Critical Buffer – Victim Cache for non-critical loads • Small Performance Improvements (upto 4%)

  4. Related Work(contd.) • Locality vs. Criticality [Srinivasan et.al., ISCA 01] • Determining Criticality – Practical Heuristics • Potential for Improvement – 40% • Locality is better than Criticality • Non-Vital Loads [Rakvic et.al., HPCA 02] • Determining Criticality – Run-time Heuristics • Small and fast Vital cache for Vital Loads • 17% Performance Improvement

  5. Load Latency Tolerance

  6. Criticality • Criticality – Effect of Load Latency on Performance • Two thresholds – Performance and Latency • A Very Direct Estimation of Criticality • Computation Intensive! • Static

  7. Determining Criticality-A Closer Look IPC Threshold=99.6% Latency Threshold = 8cycles

  8. Most Frequently Executed Loads

  9. Criticality(contd..)

  10. Critical Cache Configuration

  11. Effectiveness? • Load Reference Distribution • What %age of Loads Identified as Critical • Miss Rate for Critical Load References • Critical Cache Configuration compared with • Faster Conventional Cache Configuration • DL1/DL2 Latencies – 3/10, 6/20, 9/30 cycles • Critical Cache Configuration compared with • Larger Conventional Cache Configuration • DL1 Sizes – 8KB, 16KB, 32KB, 64KB

  12. Processor Configuration Similar to Alpha 21264 using SimpleScalar-3.0 [Austin, Burger 97]

  13. Results

  14. ResultsComparison with a faster conventional Cache Configuration IPCs normalized to 16K-1cycle Configuration 25-66% of the Penalty due to a slower cache is eliminated

  15. ResultsComparison with a faster Conventional Cache Configuration IPCs normalized to 32K-1cycle Configuration 25-70% of the Penalty due to a slower cache is eliminated

  16. ResultsComparison with a larger Conventional cache Configuration IPCs normalized to 16K-3cycle Configuration

  17. ResultsComparison with a larger Conventional cache Configuration IPCs normalized to 32k_6cycle Configuration Critical cache Configuration outperforms a larger conventional cache

  18. Conclusions & Future Work • Conclusions • Compares well with a faster conventional cache • Outperforms a larger conventional cache in most cases • Future Work • More heuristics to refine “criticality” • Why are “critical loads” critical? • Criticality of a memory address vs. criticality of a load instruction • Criticality for lowpower Caches

More Related