170 likes | 278 Views
On the Importance of Optimizing the Configuration of Stream Prefetches. Ilya Ganusov Martin Burtscher. Computer Systems Laboratory Cornell University. Introduction. Memory wall Increasing gap between processor and memory speeds Concentration on bandwidth at the expense of latency
E N D
On the Importance of Optimizing the Configuration of Stream Prefetches Ilya Ganusov Martin Burtscher Computer Systems Laboratory Cornell University
Introduction • Memory wall • Increasing gap between processor and memory speeds • Concentration on bandwidth at the expense of latency • Prefetch important data • Do not wait until the processor requests data • Pro-actively fetch the data that is likely to be consumed in the near future MSP 2005
Stream Prefetching • Prefetching with outcome-based prediction • Use the history of previous misses to guess data addresses that are likely to miss soon • Stream prefetching • A special case of outcome-based prediction • Proposed 15 years ago • The only hardware prefetching scheme used in modern microprocessors MSP 2005
Contributions • Detailed sensitivity analysis of main prefetcher parameters on SPECcpu2000 programs • No such study in the literature • Many research papers fail to specify prefetcher parameters in comparative studies • Case study • Evaluate performance of Runahead execution on a baseline with different stream prefetcher parameters MSP 2005
Outline • Introduction • Stream Prefetcher Operation • Evaluation Methodology • Experimental Results • Conclusion MSP 2005
How Stream Prefetchers Work Global miss history Stream table AGU = addr + stride * lookahead Stream exists? MSP 2005
Measured Parameters miss history length Number of supported streams prefetch distance AGU = addr + stride * lookahead Stream exists? MSP 2005
Evaluation Methodology • Benchmarks • 22 SPECcpu2000 programs, highly optimized • All F77, C, and C++ programs • Multiple reference inputs per program • SimPoint interval of 500 million instructions • Simulated architecture • SimpleScalar v4.0 cycle-accurate simulator • Aggressive superscalar Alpha 21264-like core MSP 2005
Simulated System MSP 2005
Outline • Introduction • Motivation • Implementation • Experimental Results • Conclusion MSP 2005
Miss History Length 7 programs are very sensitive 16-entry history is enough MSP 2005
Number of Stream Table Entries only 3 programs are sensitive > 8 streams provides little benefit MSP 2005
L2 Cache Prefetch Distance 11 programs are very sensitive FP speedup varies by 80% - 140% MSP 2005
Case Study: Runahead Execution • Performance of stream prefetching is highly dependent on parameter choice • Another proposal: Runahead execution • Pseudo-retire long latency loads stalling the pipeline and continue executing • Roll back to checkpoint after load comes back from memory MSP 2005
Speedup over Stream Prefetching • SPEC fp speedup drops by > 2x MSP 2005
Conclusion • Key observations • The performance of the stream prefetcher is highly dependent on its configuration • Varying the prefetch distance alone almost doubles the average performance benefit • Choosing a non-optimal stream prefetcher as a baseline can distort results by a factor of two • Conclusion • Parameter optimizations are imperative when comparing stream prefetchers to other prefetching techniques MSP 2005
On the Importance of Optimizing the Configuration of Stream Prefetches Ilya Ganusov Martin Burtscher Computer Systems Laboratory Cornell University