Prefetch-Aware Shared-Resource Management for Multi-Core Systems

Prefetch-Aware Shared-Resource Managementfor Multi-Core Systems Eiman Ebrahimi* Chang JooLee*+ OnurMutlu‡ Yale N. Patt* * HPS Research Group The University of Texas at Austin ‡ Computer Architecture Laboratory Carnegie Mellon University + Intel Corporation Austin

Background and Problem Core 0 Core 1 Core 2 Core N ... Shared Memory Resources Shared Cache Core 0 Prefetcher Core N Prefetcher ... ... Memory Controller On-chip Chip Boundary Off-chip DRAM Bank 0 DRAM Bank 1 DRAM Bank 2 DRAMBank K ... 2

Background and Problem • Understand the impact of prefetching on previously proposed shared resource management techniques

Background and Problem • Understand the impact of prefetching on previously proposed shared resource management techniques • Fair cache management techniques • Fair memory controllers • Fair management of on-chip inteconnect • Fair management of multiple shared resources

Background and Problem • Understand the impact of prefetching on previously proposed shared resource management techniques • Fair cache management techniques • Fair memory controllers • Network Fair Queuing (Nesbit et. al. MICRO’06) • Parallelism Aware Batch Scheduling (Mutlu et. al. ISCA’08) • Fair management of on-chip interconnect • Fair management of multiple shared resources • Fairness via Source Throttling (Ebrahimi et. al., ASPLOS’10)

Background and Problem • Fair memory scheduling technique: Network Fair Queuing (NFQ) • Improves fairness and performance with no prefetching • Significant degradation of performance and fairness in the presence of prefetching Aggressive Stream Prefetching No Prefetching

Background and Problem • Understanding the impact of prefetching on previously proposed shared resource management techniques • Fair cache management techniques • Fair memory controllers • Fair management of on-chip inteconnect • Fair management of multiple shared resources • Goal: Devise general mechanisms for taking into account prefetch requests in fairness techniques

Background and Problem • Prior work addresses inter-application interference caused by prefetches • Hierarchical Prefetcher Aggressiveness Control (Ebrahimi et. al., MICRO’09) • Dynamically detects interference caused by prefetches and throttles down overly aggressive prefetchers • Even with controlled prefetching,fairness techniques should be made prefetch-aware

Outline • Problem Statement • Motivation for Special Treatment of Prefetches • Prefetch-Aware Shared Resource Management • Evaluation • Conclusion

Parallelism-Aware Batch Scheduling (PAR-BS) [Mutlu & Moscibroda ISCA’08] • Principle 1: Parallelism-awareness • Schedules requests from each thread to different banks back to back • Preserves each thread’s bank parallelism • Principle 2: Request Batching • Marksa fixed number of oldest requests from each thread to form a “batch” • Eliminates starvation & provides fairness T1 T1 T2 T0 T3 T2 Batch T3 T2 T0 T3 T2 T1 T1 T0 Bank 0 Bank 1

Impact of Prefetching onParallelism-Aware Batch Scheduling • Policy (a): Include prefetches and demands alike when generating a batch • Policy (b): Prefetches are not included alongside demands when generating a batch

Impact of Prefetching onParallelism-Aware Batch Scheduling Policy (a) Mark Prefetches in PAR-BS Accurate Prefetch Inaccurate Prefetch DRAM P2 Service Order P2 Bank 1 D2 P1 D1 D2 P2 D2 D2 Bank 2 P1 P1 D2 D2 P2 D1 P1 Stall Core 1 Compute P1 Batch P1 Stall Core 2 Compute C C Bank 1 Bank 2 Saved Cycles Hit P2 Hit P2 Policy (b) Don’t Mark Prefetches in PAR-BS Saved Cycles P2 P2 Service Order P2 P2 Bank 1 P1 D2 D1 D2 P1 P2 P1 P1 D2 D2 Bank 2 D2 D2 P1 P1 P2 D1 P1 D2 D2 Accurate Prefetches Too Late Stall Core 1 Compute P1 D1 D2 P1 Batch Stall Stall Stall Core 2 Compute C C Bank 1 Bank 2 Miss Miss

Impact of Prefetching on Parallelism-Aware Batch Scheduling • Policy (a): Include prefetches and demands alike when generating a batch • Pros: Accurate prefetches will be more timely • Cons: Inaccurate prefetches from one thread can unfairly delay demands and accurate prefetches of others • Policy (b): Prefetches are not included alongside demands when generating a batch • Pros: Inaccurate prefetches can not unfairly delay demands of other cores • Cons: Accurate prefetches will be less timely • Less performance benefit from prefetching

Prefetch-Aware Shared Resource Management • Three key ideas: • Fair memory controllers: Extend underlying prioritization policies to distinguish between prefetches based on prefetch accuracy • Fairness via source-throttling technique:Coordinate core and prefetcher throttling decisions • Demand boosting for memory non-intensive applications

Prefetch-aware PARBS (P-PARBS) Policy (a) Mark Prefetches in PAR-BS Accurate Prefetch Inaccurate Prefetch DRAM P2 Service Order P2 Bank 1 D2 P1 D1 D2 P2 D2 D2 Bank 2 P1 P1 D2 D2 P2 D1 P1 Stall Core 1 Compute P1 Batch P1 Stall Core 2 Compute C C Bank 1 Bank 2 Hit P2 Hit P2

Prefetch-aware PARBS (P-PARBS) Policy (b) Don’t Mark Prefetches in PAR-BS Accurate Prefetch Inaccurate Prefetch DRAM P2 Service Order Bank 1 P2 P1 D1 D2 P1 P2 P1 P1 Bank 2 D2 D2 P1 P1 P2 Accurate Prefetches Too Late D2 D2 Stall Core 1 Compute D1 D2 Batch Underlying prioritization policies need to distinguish between prefetches based on accuracy Stall Stall Stall Core 2 Compute C C Bank 1 Bank 2 Miss Our Policy: Mark Accurate Prefetches Miss P1 Service Order P1 P1 Bank 1 D1 D2 P2 P1 P2 P2 Bank 2 D2 D2 P2 P1 P1 D2 D2 Stall Core 1 Saved Cycles Compute D1 D2 Batch Stall Core 2 Compute C C Bank 1 Bank 2 Hit P2 Hit P2

Prefetch-Aware Shared Resource Management • Three key ideas: • Fair memory controllers: Extend underlying prioritization policies to distinguish between prefetches based on prefetch accuracy • Fairness via source-throttling technique:Coordinate core and prefetcher throttling decisions • Demand boosting for memory non-intensive applications

No Demand Boosting With Demand Boosting Legend: Legend: Core1 Dem Core1 Dem Core 1 is memory non-intensive Core 1 is memory non-intensive Serviced Last Core2 Dem Core2 Dem Core2 Pref Core2 Pref Service Order Demand boosting eliminates starvation of memory non-intensive applications Core 2 is memory intensive Core 2 is memory intensive Serviced First Bank 1 Bank 1 Bank 2 Bank 2

Prefetch-Aware Shared Resource Management • Three key ideas: • Fair memory controllers: Extend underlying prioritization policies to distinguish between prefetches based on prefetch accuracy • Fairness via source-throttlingtechnique:Coordinate core and prefetcher throttling decisions • Demand boosting for memory non-intensive applications

Evaluation Methodology • x86 cycle accurate simulator • Baseline processor configuration • Per-core • 4-wide issue, out-of-order, 256 entry ROB • Shared (4-core system) • 128 MSHRs • 2MB, 16-way L2 cache • Main Memory • DDR3 1333 MHz • Latency of 15ns per command (tRP, tRCD, CL) • 8B wide core to memory bus

System Performance Results 11% 10.9% 11.3%

Max Slowdown Results 9.9% 14.5% 18.4%

Conclusion • State-of-the-art fair shared resource management techniques can be harmful in the presence of prefetching • Their underlying prioritization techniques need to be extended to differentiate prefetches based on accuracy • Core and prefetcher throttling should be coordinated with source-based resource management techniques • Demand boosting eliminates starvation ofmemory non-intensive applications • Our mechanisms improve both fair memory schedulers and source throttling in both systemperformance and fairness by >10%

Prefetch-Aware Shared-Resource Managementfor Multi-Core Systems Eiman Ebrahimi* Chang JooLee*+ OnurMutlu‡ Yale N. Patt* * HPS Research Group The University of Texas at Austin ‡ Computer Architecture Laboratory Carnegie Mellon University + Intel Corporation Austin

Prefetch-Aware Shared-Resource Management for Multi-Core Systems