1 / 48

Feedback Directed Prefetching

Feedback Directed Prefetching. Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt. ¥ . §. . ¥. §. Problem. Solution. Prefetching can significantly improve performance When prefetches are accurate And timely However, Prefetching can also significantly degrade performance

koepp
Download Presentation

Feedback Directed Prefetching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt ¥ §  ¥ §

  2. Problem Solution • Prefetching can significantly improve performance • When prefetches are accurate • And timely • However, Prefetching can also significantly degrade performance • Due to Memory Bandwidth impact • Pollution of the cache Feedback Directed Prefetching is a comprehensive mechanism which reduces the negative effects of prefetching as well as improves the positive effects Feedback Directed Prefetching

  3. Outline • Background and Motivation • Feedback Directed Prefetching (FDP) • Metrics and How to collect • How to adapt • Prefetcher Aggressiveness • Cache Insertion Policy for Prefetches • Results Feedback Directed Prefetching

  4. Prefetch Distance Prefetch Degree X Pmax Pmax Pmax Pmax Background (Prefetcher Aggressiveness) Access Stream Prefetch Degree X+1 Predicted Stream Predicted Stream 1 2 3 P Very Conservative Middle of the Road Prefetch Distance Very Aggressive Feedback Directed Prefetching

  5. Background (Prefetcher Aggressiveness) • Very Aggressive • Well ahead of the load access stream • Hides memory access latency better • More speculative • Very Conservative • Closer to the load access stream • Might not hide memory access latency completely • Reduces potential for cache pollution and bandwidth contention Feedback Directed Prefetching

  6. Motivation • Very Aggressive improves average performance by 84% • However it can also significantly reduce performance on some benchmarks 48%  29% Feedback Directed Prefetching

  7. Outline • Background and Motivation • Feedback Directed Prefetching (FDP) • Metrics and How to collect • How to adapt • Prefetcher Aggressiveness • Cache Insertion Policy for Prefetches • Results 7 Feedback Directed Prefetching Feedback Directed Prefetching

  8. Feedback Directed Prefetching • Comprehensive mechanism which takes in account: • Prefetcher Accuracy • Prefetcher Lateness • Prefetcher-caused Cache Pollution • Adapts • Prefetcher Aggressiveness • Cache Insertion Policy for Prefetches Feedback Directed Prefetching

  9. Metrics Prefetch Accuracy Prefetch Lateness Prefetcher-caused Cache Pollution Feedback Directed Prefetching

  10. Prefetch Accuracy Useful Prefetches are referenced by the demand requests when in L2 Feedback Directed Prefetching

  11. Prefetch Accuracy Low Accuracy More likely that Prefetching can reduce performance Feedback Directed Prefetching

  12. Prefetch Accuracy • Implementation • pref-bit added to each L2 tag-store entry • Tracked using two counters: pref_total, used_total Feedback Directed Prefetching

  13. Prefetch Lateness • Measure of how timely prefetches are • Used to determine if increasing the aggressiveness helps • Implementation • pref-bit added to each L2 MSHR entry • New counter: late_total Feedback Directed Prefetching

  14. Prefetcher-caused Cache Pollution Measure of the disturbance caused by prefetched data in the cache Used to determine if the prefetcher is evicting useful data from the cache Feedback Directed Prefetching

  15. Prefetcher-caused Cache Pollution (2) Hardware Implementation Insight – this does not need to be exact Track pollution using Pollution filter Based on Bloom Filter concept Bit set when a prefetch evicts a demand miss Bit reset when a prefetch is serviced Two Counters – pollution_total, demand_total Feedback Directed Prefetching

  16. Feedback Directed Prefetching • Comprehensive mechanism which takes in account: • Prefetcher Accuracy • Prefetcher Lateness • Prefetcher-caused Cache Pollution • Adapts • Prefetcher Aggressiveness • Cache Insertion Policy 16 Feedback Directed Prefetching Feedback Directed Prefetching

  17. How to adapt? Prefetcher Aggressiveness Dynamic Configuration Counter Current Aggressiveness Feedback Directed Prefetching

  18. For Current Phase, based on static thresholds, classify Accuracy Lateness Cache-Pollution caused by Prefetches How to adapt? Prefetcher Aggressiveness (2) High Accuracy Med Accuracy Low Accuracy Not-Late Late Not-Poll Polluting Not-Poll Decrease Polluting Increase Late Decrease Not-Late Decrease Increase No Change Reduce memory bandwidth usage and Cache Pollution Improve Timeliness Reduce Cache Pollution Feedback Directed Prefetching

  19. How to Adapt?Cache Insertion Policy for Prefetches • Why adapt? • Reduce the potential for cache pollution • Classify Cache Pollution based on static thresholds: • Low – Insert at MID(n/2) Position • Eg: For a 16-way cache, MID = 8 in LRU stack • Medium – Insert at LRU-4(n/4) Position • Eg: For a 16-way cache, LRU-4 = 4 in LRU stack • High – Insert at LRU Position Feedback Directed Prefetching

  20. Outline • Background and Motivation • Feedback Directed Prefetching • Metrics and How to collect • How to adapt • Prefetcher Aggressiveness • Cache Insertion Policy for Prefetches • Results 20 Feedback Directed Prefetching Feedback Directed Prefetching

  21. Evaluation Methodology Execution-driven Alpha simulator Aggressive out-of-order superscalar processor 1 MB, 16-way, 10-cycle unified L2 cache 500-cycle minimum main memory latency Detailed memory model Prefetchers Modeled: Stream Prefetcher tracking 64 different streams Global History Buffer Prefetcher (in paper) PC-based Stride Prefetcher (in paper) Feedback Directed Prefetching

  22. Results: Adjusting Only Aggressiveness 4.7% higher avg IPC over the Very Aggressive configuration Most of the performance losses have been eliminated Feedback Directed Prefetching

  23. Results: Adjusting Only Cache Insertion Policy 5.1% better than inserting prefetches in MRU position 1.9% better than inserting prefetches in LRU-4 position Very Aggressive Prefetcher Feedback Directed Prefetching

  24. Results: Putting it all together (FDP) 13% 11% 6.5% IPC improvement over Very Aggressive configuration Performance losses converted to performance gains! Feedback Directed Prefetching

  25. BPKI - Memory Bus Accesses per 1000 retired Instructions Includes effects of L2 demand misses as well as pollution induced misses and prefetches FDP significantly improves bandwidth efficiency Bandwidth Impact 6.5% higher performance and18.7% less bandwidth 13.6% higher performance with similar bandwidth usage Feedback Directed Prefetching

  26. Hardware Cost Total hardware cost 20784 bits = 2.54 KB Percentage area overhead compared to baseline 1MB L2 cache 2.5KB/1024KB = 0.24% NOT on the critical path Feedback Directed Prefetching

  27. Outline • Background and Motivation • Feedback Directed Prefetching • Metrics and collecting this information in Hardware • How to adapt • Results • Conclusions 27 Feedback Directed Prefetching Feedback Directed Prefetching

  28. Contributions Comprehensive and low-cost feedback mechanism for hardware prefetchers Uses Prefetcher Accuracy Prefetcher Lateness Prefetcher-caused Cache Pollution Adapts Aggressiveness CacheInsertion Policy for prefetches 6.5% higher performance and 18.7% less bandwidth compared to Very Aggressive Prefetching Eliminates negative impact of prefetching Applicable to any data prefetch algorithm Feedback Directed Prefetching

  29. Questions? Feedback Directed Prefetching

  30. Backups Feedback Directed Prefetching

  31. FDP vs Prefetch Cache • Prefetch Caches eliminate prefetcher induced cache pollution • However, prefetches are now limited to the size of the prefetch cache • 5.3% higher perf. than Very Aggr.+32KB • Within 2% of Very Aggr.+64KB • Memory bandwidth of FDP is 16% less than 32KB and 9% less than 64KB. Feedback Directed Prefetching

  32. Performance on Other Prefetch algorithms • Global History Buffer Prefetcher • 20.8% less memory bandwidth than very aggressive with similar perf. • 9.9% better performance than middle-of-the-road with similar bandwidth usage • PC-based Stride Prefetcher • 4% better performance than the very aggressive • 24% reduction in bandwidth usage Feedback Directed Prefetching

  33. IPC Performance Feedback Directed Prefetching

  34. Dynamic Prefetcher Accuracy Feedback Directed Prefetching

  35. Prefetch Lateness Feedback Directed Prefetching

  36. Pollution Filter Feedback Directed Prefetching

  37. Thresholds Feedback Directed Prefetching

  38. Prefetches Sent Feedback Directed Prefetching

  39. Distribution of dynamic aggressiveness level Feedback Directed Prefetching

  40. Distribution of insertion position of prefetched blocks Feedback Directed Prefetching

  41. Effect of FDP on memory bandwidth consumption Feedback Directed Prefetching

  42. Performance of Prefetch cache vs FDP Feedback Directed Prefetching

  43. Bandwidth consumption of prefetch cache vs. FDP Feedback Directed Prefetching

  44. Effect of FDP on GHB Feedback Directed Prefetching

  45. Effect of FDP on GHB(Bandwidth) Feedback Directed Prefetching

  46. Effect of varying L2 size and memory latency Feedback Directed Prefetching

  47. IPC on other benchmarks Feedback Directed Prefetching

  48. BPKI on other benchmarks Feedback Directed Prefetching

More Related