120 likes | 271 Views
Hardware-Only Stream Prefetching and Dynamic Access Ordering. Charles Zhang and Sally A. McKee. Memory system bottleneck Streamed computations Poor cache behavior Good regularity How to make stream computations faster?. Motivation. Stream detection Stream prefetching
E N D
Hardware-Only Stream Prefetching and Dynamic Access Ordering Charles Zhang and Sally A. McKee
Memory system bottleneck Streamed computations Poor cache behavior Good regularity How to make stream computations faster? Motivation
Stream detection Stream prefetching Dynamic access ordering Can DAO improve performance w/o pattern info from software? How much performance difference is possible? Stream Prefetching + Dynamic Access Ordering
Prefetching Next-line vs stride vs pointer-based Cache vs streambuffers Fixed vs adaptive distances Access ordering With vs without prefetching Optimality vs implementability Implementation Choices
SystemModel CPU IL1 DL1 Direct RDRAMS RPT prefetcher L2 reordering memory controller bus
Reordering Algorithm • AEAP (As Early As Possible) • Maintains next-issue candidate • Chooses between new request & candidate • Recomputes candidate w/ each issue • Consistency
Experimental Setup • Simplescalar • Direct Rambus DRAMs • Benchmark suites • Spec95: int & fp • Microbenchmarks [Hong et al., HPCA ’99] • Pointer benchmarks [Austin et al., PLDI ’94]
Results for Hydro2d Speedup Prefetch distance
Results for Copy ( stride 10 ) Speedup Prefetch distance
Results for Anagram Speedup Prefetch distance
Conclusion • Hardware-only access ordering can deliver non trivial speedups • Prefetching and access ordering benefit each other
Future Work • Comparison w/ non-spatial locality cache prefetching • Comparison w/ software and with other hardware approaches • Exploring performance for other DRAMs