1 / 16

Efficient Memory Prefetching with Perceptron Filtering System

Explore enhanced Signature Path Prefetching paired with Perceptron Filtering for optimized prefetching in multi-core configurations. Learn about Aggressive L2C Prefetching, Hashed Perceptron Model, and Enhanced SPP. Discover the practicality of Prefetch Queue management and the impactful coordination between prefetchers.

flippo
Download Presentation

Efficient Memory Prefetching with Perceptron Filtering System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Enhancing Signature Path Prefetching with Perceptron Prefetch Filtering Eshan Bhatia1, Gino Chacon1, Elvira Teran2, Paul V. Gratz1, Daniel A. Jiménez3 1Texas A&M University 2Texas A&M International University 3Texas A&M University / Barcelona Supercomputing Center

  2. Introduction Design Space: Standalone L1D, L2C and LLC Prefetchers Distribution of hardware budget across three prefetchers Interaction among the prefetchers Control over placing the incoming prefetch line (L1D vs L2C vs LLC)

  3. Key Ideas • Aggressive L2C Prefetching • Signature Path Prefetcher (SPP)[Kim, MICRO ‘16] • Perceptron-based Prefetch Filtering (PPF)[Bhatia, ISCA ‘19] • Optimizing Prefetch Queue Sharing • Page based resource sharing • Minimal LLC Prefetching • Lack of information • LLC is a shared resource among cores • Coordination between levels • Minimizing impact of noisy prefetches on lower level prefetchers

  4. Page Based Resource Sharing • Prefetch Queue (PQ) limited in number • Valuable resource for L1D / L2C • Aggressive (but still accurate) prefetching • Takes the current page deep into the speculation path • Blocks PQ resources for other pages • Timing disparity between multiple pages with interleaved accesses • Efficient Resource Utilization • Track number of distinct pages in last few memory accesses • Divide PQ resource over those pages

  5. L1D Prefetcher: Next-N-Lines • Fetches N consecutive lines wrt current demand address • N determined through PQ resource availability • Page level throttling • Tracks per page access pattern for the last two accesses • Scores page as +1 delta friendly or averse • Throttles prefetching for averse pages

  6. L2C Underlying Prefetcher: SPP • Lookahead Prefetcher • Uses previous prefetch suggestion to trigger new speculation • Recursively iterate and keep compounding the confidence • Stop when the confidence falls below a certain threshold • Threshold (hyperparameter) is an indication of aggressiveness • Less threshold -> more aggressive -> more coverage -> less accuracy • Pre-defined trade-off between coverage and accuracy

  7. Enhanced SPP • Decoupled coverage and accuracy concerns • SPP enhanced to its most aggressive extreme • Helps capture complex memory access patterns • Increases coverage • Perceptron Filtering (PPF) takes care of accuracy

  8. Hashed Perceptron Model • Use feature values to index into distinct tables • Example: PC, memory address etc • Prediction: Lookup, summation, threshold • Use xi value to index into table of corresponding Wi • Learning occurs when ground truth known • Positive Outcome: Increment each feature’s partial prediction weight • Negative Outcome: Decrement each feature’s partial prediction weight • No multiplication, no division, no complex back-propagation

  9. PPF Architecture • Baseline prefetcher: SPP • Modified for high coverage • Perceptron Weights Tables • Tables of 5-bit up-down saturating counters • 1 table per feature • Variable depth, independent indexing • Prefetch and Reject Tables • Record prefetches for future training

  10. PPF Design • Subsequent feedback of a prior prefetch • Same perceptron weights re-indexed and updated by +1 / -1 • Prefetch suggestions tested using PPF • Outcome and indexing metadata recorded in Prefetch / Reject Table

  11. Putting Pieces Together Single Core Configuration L1D: Enhanced Next-N-line L2C: PPF with SPP • Triggered on all accesses to L2C • Can place prefetches in L2C or LLC LLC: Next Line prefetcher • Triggered on demand accesses and only last prefetch from L1D reaching LLC • Uses the metadata communication path between the prefetchers Overhead: 49.94 KBs Multi Core Configuration L1D: No Prefetching L2C: PPF with SPP • Triggered on all accesses to L2C • Can place prefetches in L2C or LLC LLC: SPP (without PPF) • Separate tables for each core • Modified to be less aggressive than the original SPP (LLC is a shared resource) Overhead: 62.83 KBs

  12. Results Improvement reported over no prefetching Single Core: 40.4% Multi Core: 20.3%

  13. Future Works • Better baseline prefetchers for PPF • Interaction between the prefetchers • Metadata communication path between the levels

  14. Thank you!

  15. Backup Slides

  16. L2C Underlying Prefetcher: SPP

More Related