10 likes | 144 Views
Analyzing Branch Mispredictions. Bernardo Toninho, Ligia Nistor and Filipe Militão. {btoninho, lnistor, filipe.militao}@cs.cmu.edu. Carnegie Mellon University. Simulator. Previous Approaches. Introduction. Why previous approaches do not work for our case?
E N D
Analyzing Branch Mispredictions Bernardo Toninho, Ligia Nistor and Filipe Militão {btoninho, lnistor, filipe.militao}@cs.cmu.edu Carnegie Mellon University Simulator Previous Approaches Introduction • Why previous approaches do not work for our case? • CDP (Content Directed Prefetcher) • our data structures have many pointers -> too • many useless prefetches • NL (Next Line Prefetcher) • does not work for our irregular pointer accesses • Stream Buffers • only some of our data accesses are streaming accesses • Try to exploit correlations among branches to predict future behavior Motivation Proposed Prefetching Methods • Branch Prediction: fundamental component of modern pipelined architectures, keeps pipeline full in presence of changes in control flow. • Performance is highly sensitive to predictor accuracy. Mispredictions require flushing the pipeline! • Can we analyze and classify mispredictions and obtain new insights into predictor behavior that can lead to better, more specialized, solutions? • Memory trace file is generated by PIN tool • The file is read by cache simulator • The cache simulator implements a memory system • with two level caches and our prefetchers Microarchitecture of Existing NL Microarchitecture of SCDP and SCDP-NL Results Trade-Offs • Data that informs predictions: Static vs Dynamic, Local vs Global, Taken History vs Path History, both? • Table sizes: Cost/Area Size vs Aliasing <NetScience> <DBLP Authors> Once there is an L1 miss, the NL prefetcher will issue the request to prefetch the next cache line of the miss line. This request is placed in the L2 request queue. • Comparison of prefetching methods on NetScience and DBLP Authors graphs, when doing egonet queries over all the nodes. • Recommendations: • Smaller graphs on systems where the increased memory bandwidth costs much: SCDP, because it gives best accuracy with small coverage • Smaller graphs on other systems: CDSB-NL, since it provides higher coverage with reasonable accuracy • Larger graphs: CDSB-NL, because it gives the best accuracy and coverage due to the dominant array access patterns SCDP-NL will do CDP if there are consecutive addresses in the incoming cache line, otherwise increment the incoming cache line’s address to the next line and prefetch that line. Analyzed Predictors Microarchitecture of CDSB-NL Questions Conclusion • Try to exploit correlations among branches to predict future behavior 1. Analysis of the graph mining access patterns. 2. SCDP and CDSB-NL: Advanced prefetching methods for the access patterns including both sequential and pointer accesses. 3. Cache simulator which provides cache block contents to prefetchers. 4. Experiments on real-world graphs show good accuracy and coverage of our proposed methods. CDSB-NL is an improvement of SCDP-NL. It decouples NL and SCDP prefetch requests by providing a stream buffer for SCDP prefetches. On L1 miss, the CPU checks the stream buffer for availability of the missed address.