1 / 12

Hardware-Only Stream Prefetching and Dynamic Access Ordering

Hardware-Only Stream Prefetching and Dynamic Access Ordering. Charles Zhang and Sally A. McKee. Memory system bottleneck Streamed computations Poor cache behavior Good regularity How to make stream computations faster?. Motivation. Stream detection Stream prefetching

kolina
Download Presentation

Hardware-Only Stream Prefetching and Dynamic Access Ordering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hardware-Only Stream Prefetching and Dynamic Access Ordering Charles Zhang and Sally A. McKee

  2. Memory system bottleneck Streamed computations Poor cache behavior Good regularity How to make stream computations faster? Motivation

  3. Stream detection Stream prefetching Dynamic access ordering Can DAO improve performance w/o pattern info from software? How much performance difference is possible? Stream Prefetching + Dynamic Access Ordering

  4. Prefetching Next-line vs stride vs pointer-based Cache vs streambuffers Fixed vs adaptive distances Access ordering With vs without prefetching Optimality vs implementability Implementation Choices

  5. SystemModel CPU IL1 DL1 Direct RDRAMS RPT prefetcher L2 reordering memory controller bus

  6. Reordering Algorithm • AEAP (As Early As Possible) • Maintains next-issue candidate • Chooses between new request & candidate • Recomputes candidate w/ each issue • Consistency

  7. Experimental Setup • Simplescalar • Direct Rambus DRAMs • Benchmark suites • Spec95: int & fp • Microbenchmarks [Hong et al., HPCA ’99] • Pointer benchmarks [Austin et al., PLDI ’94]

  8. Results for Hydro2d Speedup Prefetch distance

  9. Results for Copy ( stride 10 ) Speedup Prefetch distance

  10. Results for Anagram Speedup Prefetch distance

  11. Conclusion • Hardware-only access ordering can deliver non trivial speedups • Prefetching and access ordering benefit each other

  12. Future Work • Comparison w/ non-spatial locality cache prefetching • Comparison w/ software and with other hardware approaches • Exploring performance for other DRAMs

More Related