480 likes | 513 Views
Feedback Directed Prefetching. Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt. ¥ . §. . ¥. §. Problem. Solution. Prefetching can significantly improve performance When prefetches are accurate And timely However, Prefetching can also significantly degrade performance
E N D
Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt ¥ § ¥ §
Problem Solution • Prefetching can significantly improve performance • When prefetches are accurate • And timely • However, Prefetching can also significantly degrade performance • Due to Memory Bandwidth impact • Pollution of the cache Feedback Directed Prefetching is a comprehensive mechanism which reduces the negative effects of prefetching as well as improves the positive effects Feedback Directed Prefetching
Outline • Background and Motivation • Feedback Directed Prefetching (FDP) • Metrics and How to collect • How to adapt • Prefetcher Aggressiveness • Cache Insertion Policy for Prefetches • Results Feedback Directed Prefetching
Prefetch Distance Prefetch Degree X Pmax Pmax Pmax Pmax Background (Prefetcher Aggressiveness) Access Stream Prefetch Degree X+1 Predicted Stream Predicted Stream 1 2 3 P Very Conservative Middle of the Road Prefetch Distance Very Aggressive Feedback Directed Prefetching
Background (Prefetcher Aggressiveness) • Very Aggressive • Well ahead of the load access stream • Hides memory access latency better • More speculative • Very Conservative • Closer to the load access stream • Might not hide memory access latency completely • Reduces potential for cache pollution and bandwidth contention Feedback Directed Prefetching
Motivation • Very Aggressive improves average performance by 84% • However it can also significantly reduce performance on some benchmarks 48% 29% Feedback Directed Prefetching
Outline • Background and Motivation • Feedback Directed Prefetching (FDP) • Metrics and How to collect • How to adapt • Prefetcher Aggressiveness • Cache Insertion Policy for Prefetches • Results 7 Feedback Directed Prefetching Feedback Directed Prefetching
Feedback Directed Prefetching • Comprehensive mechanism which takes in account: • Prefetcher Accuracy • Prefetcher Lateness • Prefetcher-caused Cache Pollution • Adapts • Prefetcher Aggressiveness • Cache Insertion Policy for Prefetches Feedback Directed Prefetching
Metrics Prefetch Accuracy Prefetch Lateness Prefetcher-caused Cache Pollution Feedback Directed Prefetching
Prefetch Accuracy Useful Prefetches are referenced by the demand requests when in L2 Feedback Directed Prefetching
Prefetch Accuracy Low Accuracy More likely that Prefetching can reduce performance Feedback Directed Prefetching
Prefetch Accuracy • Implementation • pref-bit added to each L2 tag-store entry • Tracked using two counters: pref_total, used_total Feedback Directed Prefetching
Prefetch Lateness • Measure of how timely prefetches are • Used to determine if increasing the aggressiveness helps • Implementation • pref-bit added to each L2 MSHR entry • New counter: late_total Feedback Directed Prefetching
Prefetcher-caused Cache Pollution Measure of the disturbance caused by prefetched data in the cache Used to determine if the prefetcher is evicting useful data from the cache Feedback Directed Prefetching
Prefetcher-caused Cache Pollution (2) Hardware Implementation Insight – this does not need to be exact Track pollution using Pollution filter Based on Bloom Filter concept Bit set when a prefetch evicts a demand miss Bit reset when a prefetch is serviced Two Counters – pollution_total, demand_total Feedback Directed Prefetching
Feedback Directed Prefetching • Comprehensive mechanism which takes in account: • Prefetcher Accuracy • Prefetcher Lateness • Prefetcher-caused Cache Pollution • Adapts • Prefetcher Aggressiveness • Cache Insertion Policy 16 Feedback Directed Prefetching Feedback Directed Prefetching
How to adapt? Prefetcher Aggressiveness Dynamic Configuration Counter Current Aggressiveness Feedback Directed Prefetching
For Current Phase, based on static thresholds, classify Accuracy Lateness Cache-Pollution caused by Prefetches How to adapt? Prefetcher Aggressiveness (2) High Accuracy Med Accuracy Low Accuracy Not-Late Late Not-Poll Polluting Not-Poll Decrease Polluting Increase Late Decrease Not-Late Decrease Increase No Change Reduce memory bandwidth usage and Cache Pollution Improve Timeliness Reduce Cache Pollution Feedback Directed Prefetching
How to Adapt?Cache Insertion Policy for Prefetches • Why adapt? • Reduce the potential for cache pollution • Classify Cache Pollution based on static thresholds: • Low – Insert at MID(n/2) Position • Eg: For a 16-way cache, MID = 8 in LRU stack • Medium – Insert at LRU-4(n/4) Position • Eg: For a 16-way cache, LRU-4 = 4 in LRU stack • High – Insert at LRU Position Feedback Directed Prefetching
Outline • Background and Motivation • Feedback Directed Prefetching • Metrics and How to collect • How to adapt • Prefetcher Aggressiveness • Cache Insertion Policy for Prefetches • Results 20 Feedback Directed Prefetching Feedback Directed Prefetching
Evaluation Methodology Execution-driven Alpha simulator Aggressive out-of-order superscalar processor 1 MB, 16-way, 10-cycle unified L2 cache 500-cycle minimum main memory latency Detailed memory model Prefetchers Modeled: Stream Prefetcher tracking 64 different streams Global History Buffer Prefetcher (in paper) PC-based Stride Prefetcher (in paper) Feedback Directed Prefetching
Results: Adjusting Only Aggressiveness 4.7% higher avg IPC over the Very Aggressive configuration Most of the performance losses have been eliminated Feedback Directed Prefetching
Results: Adjusting Only Cache Insertion Policy 5.1% better than inserting prefetches in MRU position 1.9% better than inserting prefetches in LRU-4 position Very Aggressive Prefetcher Feedback Directed Prefetching
Results: Putting it all together (FDP) 13% 11% 6.5% IPC improvement over Very Aggressive configuration Performance losses converted to performance gains! Feedback Directed Prefetching
BPKI - Memory Bus Accesses per 1000 retired Instructions Includes effects of L2 demand misses as well as pollution induced misses and prefetches FDP significantly improves bandwidth efficiency Bandwidth Impact 6.5% higher performance and18.7% less bandwidth 13.6% higher performance with similar bandwidth usage Feedback Directed Prefetching
Hardware Cost Total hardware cost 20784 bits = 2.54 KB Percentage area overhead compared to baseline 1MB L2 cache 2.5KB/1024KB = 0.24% NOT on the critical path Feedback Directed Prefetching
Outline • Background and Motivation • Feedback Directed Prefetching • Metrics and collecting this information in Hardware • How to adapt • Results • Conclusions 27 Feedback Directed Prefetching Feedback Directed Prefetching
Contributions Comprehensive and low-cost feedback mechanism for hardware prefetchers Uses Prefetcher Accuracy Prefetcher Lateness Prefetcher-caused Cache Pollution Adapts Aggressiveness CacheInsertion Policy for prefetches 6.5% higher performance and 18.7% less bandwidth compared to Very Aggressive Prefetching Eliminates negative impact of prefetching Applicable to any data prefetch algorithm Feedback Directed Prefetching
Questions? Feedback Directed Prefetching
Backups Feedback Directed Prefetching
FDP vs Prefetch Cache • Prefetch Caches eliminate prefetcher induced cache pollution • However, prefetches are now limited to the size of the prefetch cache • 5.3% higher perf. than Very Aggr.+32KB • Within 2% of Very Aggr.+64KB • Memory bandwidth of FDP is 16% less than 32KB and 9% less than 64KB. Feedback Directed Prefetching
Performance on Other Prefetch algorithms • Global History Buffer Prefetcher • 20.8% less memory bandwidth than very aggressive with similar perf. • 9.9% better performance than middle-of-the-road with similar bandwidth usage • PC-based Stride Prefetcher • 4% better performance than the very aggressive • 24% reduction in bandwidth usage Feedback Directed Prefetching
IPC Performance Feedback Directed Prefetching
Dynamic Prefetcher Accuracy Feedback Directed Prefetching
Prefetch Lateness Feedback Directed Prefetching
Pollution Filter Feedback Directed Prefetching
Thresholds Feedback Directed Prefetching
Prefetches Sent Feedback Directed Prefetching
Distribution of dynamic aggressiveness level Feedback Directed Prefetching
Distribution of insertion position of prefetched blocks Feedback Directed Prefetching
Effect of FDP on memory bandwidth consumption Feedback Directed Prefetching
Performance of Prefetch cache vs FDP Feedback Directed Prefetching
Bandwidth consumption of prefetch cache vs. FDP Feedback Directed Prefetching
Effect of FDP on GHB Feedback Directed Prefetching
Effect of FDP on GHB(Bandwidth) Feedback Directed Prefetching
Effect of varying L2 size and memory latency Feedback Directed Prefetching
IPC on other benchmarks Feedback Directed Prefetching
BPKI on other benchmarks Feedback Directed Prefetching