200 likes | 371 Views
Access Map Pattern Matching Prefetch: Optimization Friendly Method. Yasuo Ishii 1 , Mary Inaba 2 , and Kei Hiraki 2 1 NEC Corporation 2 The University of Tokyo. Background. Speed gap between processor and memory has been increased
E N D
Access Map Pattern Matching Prefetch:Optimization Friendly Method Yasuo Ishii1, Mary Inaba2, and Kei Hiraki2 1 NEC Corporation 2 The University of Tokyo
Background • Speed gap between processor and memory has been increased • To hide long memory latency, many techniques have beenproposed. • Importance of HW data prefetch has been increased • Many HW prefetchers have been proposed
Conventional Methods • Prefetchers uses • Instruction Address • Memory Access Order • Memory Address • Optimizations scrambles information • Out-of-Order memory access • Loop unrolling
Limitation of Stride Prefetch[Chen+95]Out-of-Order Memory Access ・・・ Memory Address Space for (int i=0; i<N; i++) { load A[2*i]; ・・・・・ (A) } 0xAAFF 0xAB00 Access 1 0xAB01 0xAB02 Access 2 Out of Order 0xAB03 0xAB04 Access 3 Tag Address Stride State 0xAB05 A 0xAB04 2 steady 0xAB06 Access 4 ・・・ Cannot detect strides Cache Line 0xABFF ・・・
Weakness of Conventional Methods • Out-of-Order Memory Access • Scrambles memory access order • Prefetcher cannot detect address correlations • Loop-Unrolling • Requires additional table entry • Each entry trained slowly Optimization friendly prefetcher is required
Access Map Pattern Matching • Pattern Matching • Order Free Prefetching • Optimization Friendly Prefetch • Access Map • Map-base history • 2-bit state map • Each state is attached to cache block
State Diagram for Each Cache Block Access Init Access Prefetch Pre- fetch Success Access • Init • Initialized state • Access • Already accessed • Prefetch • Issued Pref. Requests • Success • Accessed Pref. Data
Memory Access Pattern Map Memory Address Space ・・・ Zone Size ・・・ Memory Access Pattern Map A S ・・・ P I A I Cache Line ・・・ Pattern Match Logic • Corresponding to memory address space • Cache line granularity
Pattern Matching Logic Access Map Shifter I I I A I A A A I A A ・・・ ・・・ 0 0 Priority Encoder & Adder Memory Access Pattern Map Addr I I I A I A A A I A A Access Map Shifter ・・・ A I I I I A A A I A A ・・・ 0 1 1 Feedback Path Addr 1 0 +2 +3 +1 ・・・ Priority Encoder & Adder (Addr+2) Prefetch Request Access Map Shifter Pattern Detector Pipeline Register Prefetch Selector
Parallel Pattern Matching A I I A I A I I I A I A S I A ・・・ ・・・ Memory Access Pattern Map • Detects patterns from memory access map • Detects address correlations in parallel • Searches candidates effectively
AMPM Prefetch Memory Address Space Memory Access Map Table Zone Hot Zone Zone P S A ・・・ I ・・・ Zone Hot Zone P S A ・・・ I Prefetch Request Zone Pattern Match Logic Hot Zone Zone Access Zone • Memory address space divides into zone • Detects hot zone • Memory Access Map Table • LRU replacement • Pattern Matching
Features of AMPM Prefetcher • Pattern Matching Base Prefetching • Map base history • Optimization friendly prefetching • Parallel pattern matching • Searches candidates effectively • Complexity-effective implementation
Configuration for DPC Competition • AMPM Prefetcher • Full-assoc 52 maps, 256 states / map • Adaptive Stream Prefetcher [Hur+ 2006] • 16 Histograms, 8 Stream Length • MSHR Configuration • 16 entries for Demand Requests (Default) • 32 entries for Prefetch Requests (Additional)
Methodology • Simulation Environment • DPC Framework • Skips first 4000M instructions and evaluate following 100M instructions • Benchmark • SPEC CPU2006 benchmark suite • Compile Option: “-O3 -fomit-frame-pointer -funroll-all-loops”
IPC Measurement Improves performance by 53% Improves performance in all benchmarks
L2 Cache Miss Count Reduces L2 Cache Miss by 76%
Related Works • Sequence-base Prefetching • Sequential Prefetch [Smith+ 1978] • Stride Prefetching Table [Fu+ 1992] • Markov Predictor [Joseph+ 1997] • Global History Buffer [Nesbit+ 2004] • Adaptive Prefetching • AC/DC [Nesbit+ 2004] • Feedback Directed Prefetch [Srinath+ 2007] • Focus Prefetching[Manikantan+ 2008]
Conclusion • Access Map Pattern Matching Prefetch • Order-Free Prefetch • Optimization friendly prefetching • Parallel Pattern Matching • Complexity-effective implementation • Optimized AMPM realizes good performance • Improves IPC by 53% • Reduces L2 cache miss by 76%
Q & A Buffer Block Gindele1977 Sequential Smith+ 1978 Commercial Processors Software Adaptive Software Support Mowry+ 1992 SuperSPARC Stride Prefetch Fu+ 1992 Adaptive Seq. Dahlgren+ 1993 PA7200 HW/SW Integrate Gornish+ 1994 Spatial RPT Chen+ 1995 R10000 Markov Prefetch Joseph+ 1997 Hybrid Hsu+ 1998 Locality Detect Johnson+, 1998 Pentium 4 Tag Correlation Hu+ 2003 Hybrid Power4 AC/DC Nesbit+ 2004 GHB Nesbit+ 2004 Spatial Pat. Chen+ 2004 Sequence-Base (Order Sensitive) Adaptive Stream Hur+ 2006 SMS Somogyi 2006 FDP Srinath+ 2007 AMPM Prefetch Ishii+ 2009 Feedback based Honjo 2009