250 likes | 417 Views
2011 International Symposium on Performance Analysis on Systems and Software (ISPASS). Characterization and Dynamic Mitigation of Intra-Application Cache Interference. Carole-Jean Wu and Margaret Martonosi Princeton University 4/11/2011. Today’s CMP systems. Memory Controller. L1D$. L1I$.
E N D
2011 International Symposium on Performance Analysis on Systems and Software (ISPASS) Characterization and Dynamic Mitigation of Intra-Application Cache Interference Carole-Jean Wu and Margaret Martonosi Princeton University 4/11/2011 1/23
Today’s CMP systems Memory Controller L1D$ L1I$ L1I$ L1D$ L1I$ L1D$ L1I$ L1D$ L2$ L2$ L2$ L2$ Operating System SMT CPU Core 0 SMT CPU Core 1 SMT CPU Core 2 SMT CPU Core 3 App. 2 Communication Bridge App. 2 App. 3 App. 1 App. 4 IO & QPI IO & QPI Shared 8MB L3 Cache 1/23
Within a single application, cache interference can stem from… Memory Controller L1D$ L1I$ L1D$ L1D$ L1I$ L1D$ L1I$ L1I$ L2$ L2$ L2$ L2$ Operating System SMT CPU Core 0 SMT CPU Core 1 SMT CPU Core 2 SMT CPU Core 3 Communication Bridge App. 1 IO & QPI IO & QPI HW Prefetch Req. TLB Miss Handling Shared 8MB L3 Cache Other OS Req. App. Data ld/st 2/23
Real-System LLC Miss Characterization >50% of LLC misses are due to prefetching, TLB miss handling, other OS refs, etc. 3/23
Prior Work for Intra-Application Cache Interference • But all require hardware modification • System-induced Cache Interference • Characterization indicates significant OS/user cache interference [Agarwal et al. TOC ’88][Torrellas et al. ASPLOS ’92] • Reduce TLB miss handling effects [Jacob, Mudge ASPLOS ’98][Bhargava et al. ASPLOS ’08] [Barr, Cox, and Rixner ISCA ’10] • Prefetch-induced Cache Interference • Prefetch buffer/filter [Peir et al. ICS ’02] [Hur and Lin MICRO ’06] • Replacement policies (Prefetch bit per cache line) [Alameldeen and Wood ISCA ’07] [Lin et al. HPCA ’01] • Prefetching algorithms [Ebrahimi et al. MICRO ’09] [Nesbit et al. ISCA ’07] [Iacobovici et al. ICS ’04] 1/23 4/23
Contributions of This Paper • Cache interference within an application is a problem • Real-system characterization • Detailed full-system simulation • Dynamic management mechanisms • System-aware cache management • Real-system, real-time prefetch manager 1/23 5/23
Talk Outline • Motivation and Prior Work • Measurement Methodology • Intra-Application Interference Characterization • Dynamic Mitigation of LLC Interference • System-Aware Cache Management • Real-System Dynamic Prefetch Manager • Conclusion 1/23 6/23
Measurement Methodology • Real-system infrastructure • Intel Nehalem-based Core i7 (Bloomfield) • perfmon2to access hardware PMCs • Full-system simulation: Simics/GEMS • Simics/GEMS full system simulation • Benchmarks • SPEC CPU2006 benchmark suite 1/23 7/23
System-Mode Reference Breakdown 80% of system references are due to TLB miss handling(details in the paper). 1/23 8/23
Memory Reuse Characteristics Analysis for User References User System System cache lines destroy good data locality of user lines when sharing the cache! 1/23 9/23
Memory Reuse Characteristics Analysis for System References User System Majority of system cache lines are not reused. Bypassing system cache lines? 1/23 10/23
System-Aware Cache Management LRU MRU 0xEEEA Refs . . . . . . . . 1/23 11/23
System-Aware Cache Management LRU MRU Refs 0X001A MRU 0XDADA . . . . 0XEEAF 0X1234 . . . . 0xEEEA LRU 0XDFAE MID 1/23 12/23
System-Aware Cache Management LRU MRU user Refs MRU 0XDADA . . . . 0XEEAF 0X1234 . . . . 0xEEEA LRU 0XDFAE …. MID system SYS-LRUinsert 1/23 13/23
System-Aware Cache Management LRU MRU user Refs MRU 0XDADA . . . . 0XEEAF 0X1234 . . . . 0xEEEA LRU …. MID system SYS-MIDinsert 1/23 14/23
System-Aware Cache Management LRU MRU user Refs MRU 0XDADA . . . . 0XEEAF 0X1234 . . . . 0xBEEF LRU …. MID system SYS-DYNAMIC *Set sampling: DIP [Qureshi et al. ISCA ‘07] 1/23 15/23
IPC Performance Improvement SYS-DYNAMIC improves performance for ALLapplications by as much as 10% (avg. of 3%). 1/23 16/23
Talk Outline • Motivation and Prior Work • Measurement Methodology • Intra-Application Interference Characterization • Dynamic Mitigation of LLC Interference • System-Aware Cache Management • Real-System Dynamic Prefetch Manager • Conclusion 1/23 17/23
Intra-application cache interference can also stem from hardware prefetching Memory Controller L1D$ L1I$ L1D$ L1I$ L1D$ L1I$ L1I$ L1D$ L2$ L2$ L2$ L2$ SMT CPU Core 0 SMT CPU Core 1 SMT CPU Core 2 SMT CPU Core 3 L1 Instruction & Streamer Prefetchers Communication Bridge IO & QPI IO & QPI Mid-Level Cache (MLC) Spatial & Streamer Prefetchers Shared 8MB L3 Cache 1/23 18/23
Intra-Application Interference Caused by Hardware Prefetching MLC Prefetcher OFF Less LLC Misses for libquantum and sphinx3 1/23 19/23
Dynamic Prefetch Management K Inst. K Inst. . . . . . N time MLC prefetchersON OFF ON Read RDTSC Read RDTSC t0 t1 t2 if ( t2 - t1 > t1 – t0) Turn ON MLC prefetchers; else Turn OFF MLC prefetchers; Use Nehalem’s Precise Event Based Sampling (PEBS) Sample application inst. count periodically. 1/23 20/23
Dynamic Management Mitigating Prefetch-Induced LLC Interference Dynamic modulation of MLC prefetchers>> Static ON/OFF prefetch options. 1/23 21/23
Summary • Dynamic System-AwareCache Management • Full-system evaluation (OS effects) • Performance improvement by as much as 10% (on avg. 3%). • Real-time Dynamic Prefetch Manager • Real-system implementation on Nehalem PEBS • 25% LLC miss count reduction performance+, bandwidth & energy saving 1/23 22/23
Characterization and Dynamic Mitigation of Intra-Application Cache Interference Memory Controller L1D$ L1I$ L1D$ L1D$ L1I$ L1D$ L1I$ L1I$ L2$ L2$ L2$ L2$ Operating System *Intra-application* cache Interference from modern hardware prefetching & OS influence app. performance significantly! SMT CPU Core 0 SMT CPU Core 1 SMT CPU Core 2 SMT CPU Core 3 Communication Bridge App. 1 IO & QPI IO & QPI HW Prefetch Req. TLB Miss Handling Shared 8MB L3 Cache App. Data ld/st Other OS Req. 1/23 23/23
2011 International Symposium on Performance Analysis on Systems and Software (ISPASS) Characterization and Dynamic Mitigation of Intra-Application Cache Interference Carole-Jean Wu and Margaret Martonosi {carolewu, mrm}@princeton.edu 1/23