520 likes | 667 Views
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture. Seongbeom Kim , Dhruba Chandra, and Yan Solihin Dept. of Electrical and Computer Engineering North Carolina State University {skim16, dchandr, solihin}@ncsu.edu. Cache Sharing in CMP. Processor Core 1.
E N D
Fair Cache Sharing and Partitioningin a Chip Multiprocessor Architecture Seongbeom Kim, Dhruba Chandra, and Yan Solihin Dept. of Electrical and Computer Engineering North Carolina State University {skim16, dchandr, solihin}@ncsu.edu
Cache Sharing in CMP Processor Core 1 Processor Core 2 L1 $ L1 $ L2 $ …… Seongbeom Kim, NCSU
Cache Sharing in CMP Processor Core 1 Processor Core 2 ←t1 L1 $ L1 $ L2 $ …… Seongbeom Kim, NCSU
Cache Sharing in CMP Processor Core 1 Processor Core 2 t2→ L1 $ L1 $ L2 $ …… Seongbeom Kim, NCSU
Cache Sharing in CMP Processor Core 1 Processor Core 2 ←t1 t2→ L1 $ L1 $ L2 $ …… t2’s throughput is significantly reduced due to unfair cache sharing. Seongbeom Kim, NCSU
Shared L2 cache space contention Seongbeom Kim, NCSU
Shared L2 cache space contention Seongbeom Kim, NCSU
time slice t1 t2 t3 t1 t4 time slice t1 t1 t1 t1 t1 t3 t3 t2 t2 t4 Impact of unfair cache sharing • Uniprocessor scheduling • 2-core CMP scheduling • Problems of unfair cache sharing • Sub-optimal throughput • Thread starvation • Priority inversion • Thread-mix dependent throughput • Fairness: uniform slowdown for co-scheduled threads P1: P2: Seongbeom Kim, NCSU
Contributions • Cache fairness metrics • Easy to measure • Approximate uniform slowdown well • Fair caching algorithms • Static/dynamic cache partitioning • Optimizing fairness • Simple hardware modifications • Simulation results • Fairness: 4x improvement • Throughput • 15% improvement • Comparable to cache miss minimization approach Seongbeom Kim, NCSU
Related Work • Cache miss minimization in CMP: • G. Suh, S. Devadas, L. Rudolph, HPCA 2002 • Balancing throughput and fairness in SMT: • K. Luo, J. Gummaraju, M. Franklin, ISPASS 2001 • A. Snavely and D. Tullsen, ASPLOS, 2000 • … Seongbeom Kim, NCSU
Outline • Fairness Metrics • Static Fair Caching Algorithms (See Paper) • Dynamic Fair Caching Algorithms • Evaluation Environment • Evaluation • Conclusions Seongbeom Kim, NCSU
Fairness Metrics • Uniform slowdown Execution time of ti when it runs alone. Seongbeom Kim, NCSU
Fairness Metrics • Uniform slowdown Execution time of ti when it shares cache with others. Seongbeom Kim, NCSU
Fairness Metrics • Uniform slowdown • We want to minimize: • Ideally: Seongbeom Kim, NCSU
Fairness Metrics • Uniform slowdown • We want to minimize: • Ideally: Seongbeom Kim, NCSU
Fairness Metrics • Uniform slowdown • We want to minimize: • Ideally: Seongbeom Kim, NCSU
Outline • Fairness Metrics • Static Fair Caching Algorithms (See Paper) • Dynamic Fair Caching Algorithms • Evaluation Environment • Evaluation • Conclusions Seongbeom Kim, NCSU
LRU LRU LRU LRU Partitionable Cache Hardware • Modified LRU cache replacement policy • G. Suh, et. al., HPCA 2002 Current Partition Target Partition P1: 448B P1: 384B P2: 576B P2: 640B P2 Miss Seongbeom Kim, NCSU
LRU LRU LRU LRU LRU LRU * * LRU LRU Partitionable Cache Hardware • Modified LRU cache replacement policy • G. Suh, et. al., HPCA 2002 Current Partition Target Partition P1: 448B P1: 384B P2: 576B P2: 640B P2 Miss Current Partition Target Partition P1: 384B P1: 384B P2: 640B P2: 640B Seongbeom Kim, NCSU
MissRate shared P1: P2: Repartitioning interval Target Partition P1: P2: Dynamic Fair Caching Algorithm MissRate alone Ex) Optimizing M3 metric P1: P2: Seongbeom Kim, NCSU
MissRate shared MissRate shared P1: P1:20% P2: P2:15% Repartitioning interval Target Partition P1:256KB P2:256KB Dynamic Fair Caching Algorithm MissRate alone 1st Interval P1:20% P2: 5% Seongbeom Kim, NCSU
MissRate shared P1:20% P2:15% Repartitioning interval Target Partition Target Partition P1:256KB P1:192KB P2:320KB P2:256KB Dynamic Fair Caching Algorithm MissRate alone Repartition! P1:20% P2: 5% Evaluate M3 P1: 20% / 20% P2: 15% / 5% Partition granularity: 64KB Seongbeom Kim, NCSU
MissRate shared MissRate shared MissRate shared P1:20% P1:20% P1:20% P2:10% P2:15% P2:15% Repartitioning interval Target Partition P1:192KB P2:320KB Dynamic Fair Caching Algorithm MissRate alone 2nd Interval P1:20% P2: 5% Seongbeom Kim, NCSU
MissRate shared MissRate shared P1:20% P1:20% P2:15% P2:10% Repartitioning interval Target Partition Target Partition P1:128KB P1:192KB P2:384KB P2:320KB Dynamic Fair Caching Algorithm MissRate alone Repartition! P1:20% P2: 5% Evaluate M3 P1: 20% / 20% P2: 10% / 5% Seongbeom Kim, NCSU
MissRate shared MissRate shared MissRate shared P1:20% P1:20% P1:25% P2:10% P2: 9% P2:10% Repartitioning interval Target Partition P1:128KB P2:384KB Dynamic Fair Caching Algorithm MissRate alone 3rd Interval P1:20% P2: 5% Seongbeom Kim, NCSU
MissRate shared MissRate shared P1:25% P1:20% P2:10% P2: 9% Repartitioning interval Target Partition Target Partition P1:192KB P1:128KB P2:320KB P2:384KB Dynamic Fair Caching Algorithm MissRate alone Do Rollback if: P2: Δ<Trollback Δ=MRold-MRnew Repartition! P1:20% P2: 5% Seongbeom Kim, NCSU
Fair Caching Overhead • Partitionable cache hardware • Profiling • Static profiling for M1, M3 • Dynamic profiling for M1, M3, M4 • Storage • Per-thread registers • Miss rate/count for “alone” case • Miss rate/count for “shared’ case • Repartitioning algorithm • < 100 cycles overhead in 2-core CMP • invoked at every repartitioning interval Seongbeom Kim, NCSU
Outline • Fairness Metrics • Static Fair Caching Algorithms (See Paper) • Dynamic Fair Caching Algorithms • Evaluation Environment • Evaluation • Conclusions Seongbeom Kim, NCSU
Evaluation Environment • UIUC’s SESC Simulator • Cycle accurate Seongbeom Kim, NCSU
Evaluation Environment • 18 benchmark pairs • Algorithm Parameters • Static algorithms: FairM1 • Dynamic algorithms: FairM1Dyn, FairM3Dyn, FairM4Dyn Seongbeom Kim, NCSU
Outline • Fairness Metrics • Static Fair Caching Algorithms (See Paper) • Dynamic Fair Caching Algorithms • Evaluation Environment • Evaluation • Correlation results • Static fair caching results • Dynamic fair caching results • Impact of rollback threshold • Impact of time interval • Conclusions Seongbeom Kim, NCSU
Correlation Results Seongbeom Kim, NCSU
Correlation Results M1 & M3 show best correlation with M0. Seongbeom Kim, NCSU
Static Fair Caching Results Seongbeom Kim, NCSU
Static Fair Caching Results FairM1 has comparable throughput as MinMiss with better fairness Seongbeom Kim, NCSU
Static Fair Caching Results Opt assures that better fairness is achieved without throughput loss. Seongbeom Kim, NCSU
Dynamic Fair Caching Results Seongbeom Kim, NCSU
Dynamic Fair Caching Results FairM1Dyn, FairM3Dyn show best fairness and throughput. Seongbeom Kim, NCSU
Dynamic Fair Caching Results Improvement in fairness results in throughput gain. Seongbeom Kim, NCSU
Dynamic Fair Caching Results Fair caching sometimes degrades throughput (2 out of 18). Seongbeom Kim, NCSU
Impact of Rollback Threshold in FairM1Dyn Seongbeom Kim, NCSU
Impact of Rollback Threshold in FairM1Dyn ’20% Trollback’ shows best fairness and throughput. Seongbeom Kim, NCSU
Impact of Repartitioning Interval in FairM1Dyn Seongbeom Kim, NCSU
Impact of Repartitioning Interval in FairM1Dyn ‘10K L2 accesses’ shows best fairness and throughput. Seongbeom Kim, NCSU
Outline • Fairness Metrics • Static Fair Caching Algorithms (See Paper) • Dynamic Fair Caching Algorithms • Evaluation Environment • Evaluation • Conclusions Seongbeom Kim, NCSU
Conclusions • Problems of unfair cache sharing • Sub-optimal throughput • Thread starvation • Priority inversion • Thread-mix dependent throughput • Contributions • Cache fairness metrics • Static/dynamic fair caching algorithms • Benefits of fair caching • Fairness: 4x improvement • Throughput • 15% improvement • Comparable to cache miss minimization approach • Fair caching simplifies scheduler design • Simple hardware support Seongbeom Kim, NCSU
Partitioning Histogram Mostly oscillating between two partitioning choices. Seongbeom Kim, NCSU
Partitioning Histogram Trollback of 35% can still find better partition. Seongbeom Kim, NCSU
Impact of Partition Granularity in FairM1Dyn 64KB shows best fairness and throughput. Seongbeom Kim, NCSU
Impact of Initial Partition in FairM1Dyn Tolerable differences from various initial partition. Seongbeom Kim, NCSU