1 / 20

Access Map Pattern Matching Prefetch: Optimization Friendly Method

Access Map Pattern Matching Prefetch: Optimization Friendly Method. Yasuo Ishii 1 , Mary Inaba 2 , and Kei Hiraki 2 1 NEC Corporation 2 The University of Tokyo. Background. Speed gap between processor and memory has been increased

isra
Download Presentation

Access Map Pattern Matching Prefetch: Optimization Friendly Method

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Access Map Pattern Matching Prefetch:Optimization Friendly Method Yasuo Ishii1, Mary Inaba2, and Kei Hiraki2 1 NEC Corporation 2 The University of Tokyo

  2. Background • Speed gap between processor and memory has been increased • To hide long memory latency, many techniques have beenproposed. • Importance of HW data prefetch has been increased • Many HW prefetchers have been proposed

  3. Conventional Methods • Prefetchers uses • Instruction Address • Memory Access Order • Memory Address • Optimizations scrambles information • Out-of-Order memory access • Loop unrolling

  4. Limitation of Stride Prefetch[Chen+95]Out-of-Order Memory Access ・・・ Memory Address Space for (int i=0; i<N; i++) { load A[2*i]; ・・・・・ (A) } 0xAAFF 0xAB00 Access 1 0xAB01 0xAB02 Access 2 Out of Order 0xAB03 0xAB04 Access 3 Tag Address Stride State 0xAB05 A 0xAB04 2 steady 0xAB06 Access 4 ・・・ Cannot detect strides Cache Line 0xABFF ・・・

  5. Weakness of Conventional Methods • Out-of-Order Memory Access • Scrambles memory access order • Prefetcher cannot detect address correlations • Loop-Unrolling • Requires additional table entry • Each entry trained slowly Optimization friendly prefetcher is required

  6. Access Map Pattern Matching • Pattern Matching • Order Free Prefetching • Optimization Friendly Prefetch • Access Map • Map-base history • 2-bit state map • Each state is attached to cache block

  7. State Diagram for Each Cache Block Access Init Access Prefetch Pre- fetch Success Access • Init • Initialized state • Access • Already accessed • Prefetch • Issued Pref. Requests • Success • Accessed Pref. Data

  8. Memory Access Pattern Map Memory Address Space ・・・ Zone Size ・・・ Memory Access Pattern Map A S ・・・ P I A I Cache Line ・・・ Pattern Match Logic • Corresponding to memory address space • Cache line granularity

  9. Pattern Matching Logic Access Map Shifter I I I A I A A A I A A ・・・ ・・・ 0 0 Priority Encoder & Adder Memory Access Pattern Map Addr I I I A I A A A I A A Access Map Shifter ・・・ A I I I I A A A I A A ・・・ 0 1 1 Feedback Path Addr 1 0 +2 +3 +1 ・・・ Priority Encoder & Adder (Addr+2) Prefetch Request Access Map Shifter Pattern Detector Pipeline Register Prefetch Selector

  10. Parallel Pattern Matching A I I A I A I I I A I A S I A ・・・ ・・・ Memory Access Pattern Map • Detects patterns from memory access map • Detects address correlations in parallel • Searches candidates effectively

  11. AMPM Prefetch Memory Address Space Memory Access Map Table Zone Hot Zone Zone P S A ・・・ I ・・・ Zone Hot Zone P S A ・・・ I Prefetch Request Zone Pattern Match Logic Hot Zone Zone Access Zone • Memory address space divides into zone • Detects hot zone • Memory Access Map Table • LRU replacement • Pattern Matching

  12. Features of AMPM Prefetcher • Pattern Matching Base Prefetching • Map base history • Optimization friendly prefetching • Parallel pattern matching • Searches candidates effectively • Complexity-effective implementation

  13. Configuration for DPC Competition • AMPM Prefetcher • Full-assoc 52 maps, 256 states / map • Adaptive Stream Prefetcher [Hur+ 2006] • 16 Histograms, 8 Stream Length • MSHR Configuration • 16 entries for Demand Requests (Default) • 32 entries for Prefetch Requests (Additional)

  14. Budget Count

  15. Methodology • Simulation Environment • DPC Framework • Skips first 4000M instructions and evaluate following 100M instructions • Benchmark • SPEC CPU2006 benchmark suite • Compile Option: “-O3 -fomit-frame-pointer -funroll-all-loops”

  16. IPC Measurement Improves performance by 53% Improves performance in all benchmarks

  17. L2 Cache Miss Count Reduces L2 Cache Miss by 76%

  18. Related Works • Sequence-base Prefetching • Sequential Prefetch [Smith+ 1978] • Stride Prefetching Table [Fu+ 1992] • Markov Predictor [Joseph+ 1997] • Global History Buffer [Nesbit+ 2004] • Adaptive Prefetching • AC/DC [Nesbit+ 2004] • Feedback Directed Prefetch [Srinath+ 2007] • Focus Prefetching[Manikantan+ 2008]

  19. Conclusion • Access Map Pattern Matching Prefetch • Order-Free Prefetch • Optimization friendly prefetching • Parallel Pattern Matching • Complexity-effective implementation • Optimized AMPM realizes good performance • Improves IPC by 53% • Reduces L2 cache miss by 76%

  20. Q & A Buffer Block Gindele1977 Sequential Smith+ 1978 Commercial Processors Software Adaptive Software Support Mowry+ 1992 SuperSPARC Stride Prefetch Fu+ 1992 Adaptive Seq. Dahlgren+ 1993 PA7200 HW/SW Integrate Gornish+ 1994 Spatial RPT Chen+ 1995 R10000 Markov Prefetch Joseph+ 1997 Hybrid Hsu+ 1998 Locality Detect Johnson+, 1998 Pentium 4 Tag Correlation Hu+ 2003 Hybrid Power4 AC/DC Nesbit+ 2004 GHB Nesbit+ 2004 Spatial Pat. Chen+ 2004 Sequence-Base (Order Sensitive) Adaptive Stream Hur+ 2006 SMS Somogyi 2006 FDP Srinath+ 2007 AMPM Prefetch Ishii+ 2009 Feedback based Honjo 2009

More Related