1 / 24

Predictor-Directed Stream Buffers

Predictor-Directed Stream Buffers. Timothy Sherwood Suleyman Sair Brad Calder. Overview. Introduction Past Stream Buffer work Predictor-Directed Stream Buffers Policy Improvements Results Contribution. Introduction. Memory Wall Latency reduction through prefetching

kyria
Download Presentation

Predictor-Directed Stream Buffers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Predictor-Directed Stream Buffers Timothy Sherwood Suleyman Sair Brad Calder

  2. Overview • Introduction • Past Stream Buffer work • Predictor-Directed Stream Buffers • Policy Improvements • Results • Contribution Sherwood, Sair, and Calder

  3. Introduction • Memory Wall • Latency reduction through prefetching • without eating too much bandwidth • Stream Buffers are one of the most used • simple to implement • very efficient • Pointer based codes Sherwood, Sair, and Calder

  4. Past Stream Buffer work • Jouppi 1990 • consecutive cache line FIFO • Palacharla and Kessler 1994 • non-unit stride (based on memory chunk) • allocation filters • Farkas et. al. 1997 • PC-based stride • fully associative / non-overlapping Sherwood, Sair, and Calder

  5. tag cache block comparator tag cache block comparator Last Address Predicted Stride Past Stream Buffer work to data cache, register file, and MSHRs store predict_stride in streaming buffer on allocation N buffers from/to next lower level of memory Sherwood, Sair, and Calder

  6. Past Stream Buffer work • Past work targeted at streaming in arrays • either in sequential order • or stride order (multidimensional array) • Could not handle Pointer Codes • repetitive non-striding references • Need a more General Predictor Sherwood, Sair, and Calder

  7. Predictor-Directed Stream Buffer • The Goal: Simple and efficient hardware based prefetching of complex but predictable streams • Approach: Take a general predictor and hook it up to the well established stream buffer front end. • Separate the predictor from the prefetcher • Can use almost any predictor • 2 Delta • Context • Markov Sherwood, Sair, and Calder

  8. load info (PC, address) from write-back stage tag cache block comparator tag cache block comparator Address Predictor PSB Generalized Architecture to data cache, register file, and MSHRs Prediction Info subset of prediction info predicted address Load PC History Stride Confidence Last Address update prediction information predicted address N buffers from/to next lower level of memory Sherwood, Sair, and Calder

  9. PSB Stages • Allocation • Prediction • Probe • Prefetching • Lookup Sherwood, Sair, and Calder

  10. Stage Descriptions • Allocation • Stream Buffer is allocated to a particular load • the buffer is initialized • subject to Allocation Filters • Prediction • an empty buffer entry asks for an address • subject to limited predictor speed. Sherwood, Sair, and Calder

  11. Stage Descriptions (Continued) • Probe • if there are free ports remove useless prefetches • not mandatory • Prefetching • subject to scheduling for ports and priority, prefetches are sent to memory • Lookup • when a load performs an L1 access, the Stream Buffers are checked in parallel Sherwood, Sair, and Calder

  12. PSB Implementation • Tried many different address predictors • Best is Stride Filtered Markov • similar to Joseph and Grunwald’s Predictor • first order Markov • striding behavior is filtered out • Difference is stored to reduce size Sherwood, Sair, and Calder

  13. Difference Storing Sherwood, Sair, and Calder

  14. load info (PC, address) from write-back stage tag cache block comparator tag cache block comparator to data cache, register file, and MSHRs Stride Predictor store predicted stride in streaming buffer on allocation Last Address Predicted Stride predicted address markov hit? last address Markov Predictor MUX predicted markov address if hit, return predicted address predicted stride address 8 buffers from/to next lower level of memory PSB with SFM Sherwood, Sair, and Calder

  15. Methods • SimpleScalar 3.0 • Rewrote memory hierarchy • Model bandwidth between all levels • Added perfect store sets • Ran over set of Pointer Benchmarks • 2K entry predictor table • 8 buffers x 4 entry Stream Buffers • 32k 4-way associative cache Sherwood, Sair, and Calder

  16. Speedup from PSB Sherwood, Sair, and Calder

  17. Allocation Filtering • Farkas et.al. showed how two miss filtering • prevents too many streams requesting resources • Does not work as well for pointer codes • irregular miss patterns • We use Priority and Accuracy Counters • track behavior of Loads • allocate to Loads that are Behaving well Sherwood, Sair, and Calder

  18. Allocation Filtering Speedup Sherwood, Sair, and Calder

  19. Stream Buffer Priority • Round Robin • give each active buffer equal resources • predictor and prefetching • Priority Counters • uses small counters with each buffer • use the counters to rank buffer • more resources to better performing buffers Sherwood, Sair, and Calder

  20. Priority Scheduling Speedup Sherwood, Sair, and Calder

  21. Latency Reduction Sherwood, Sair, and Calder

  22. Contributions • Predictor-Directed Stream Buffers allow decoupling of Stream Buffer front end from address generation • Using accuracy based allocation filtering and priority scheduling can make a large difference in performance • With some simple compression, even small Markov tables can be very effective Sherwood, Sair, and Calder

  23. Accuracy Sherwood, Sair, and Calder

  24. Bus Results Sherwood, Sair, and Calder

More Related