260 likes | 357 Views
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture. Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer Engineering North Carolina State University HPCA-2005. Cache Sharing in CMP. Processor Core 1. Processor Core 2. L1 $. L1 $.
E N D
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer Engineering North Carolina State University HPCA-2005
Cache Sharing in CMP Processor Core 1 Processor Core 2 L1 $ L1 $ L2 $ …… Chandra, Guo, Kim, Solihin - Contention Model
Need a model to understand cache sharing impact Impact of Cache Space Contention • Application-specific (what) • Coschedule-specific (when) • Significant: Up to 4X cache misses, 65% IPC reduction Chandra, Guo, Kim, Solihin - Contention Model
Related Work • Uniprocessor miss estimation: Cascaval et al., LCPC 1999 Chatterjee et al., PLDI 2001 Fraguela et al., PACT 1999 Ghosh et al., TPLS 1999 J. Lee at al., HPCA 2001 Vera and Xue, HPCA 2002 Wassermann et al., SC 1997 • Context switch impact on time-shared processor: Agarwal, ACM Trans. On Computer Systems, 1989 Suh et al., ICS 2001 • No model for cache sharing impact: • Relatively new phenomenon: SMT, CMP • Many possible access interleaving scenarios Chandra, Guo, Kim, Solihin - Contention Model
Contributions • Inter-Thread cache contention models • 2 Heuristics models (refer to the paper) • 1 Analytical model • Input: circular sequence profiling for each thread • Output: Predicted num cache misses per thread in a co-schedule • Validation • Against a detailed CMP simulator • 3.9% average error for the analytical model • Insight • Temporal reuse patterns impact of cache sharing Chandra, Guo, Kim, Solihin - Contention Model
Outline • Model Assumptions • Definitions • Inductive Probability Model • Validation • Case Study • Conclusions Chandra, Guo, Kim, Solihin - Contention Model
Outline • Model Assumptions • Definitions • Inductive Probability Model • Validation • Case Study • Conclusions Chandra, Guo, Kim, Solihin - Contention Model
Assumptions • One circular sequence profile per thread • Average profile yields high prediction accuracy • Phase-specific profile may improve accuracy • LRU Replacement Algorithm • Others are usu. LRU approximations • Threads do not share data • Mostly true for serial apps • Parallel apps: threads likely to be impacted uniformly Chandra, Guo, Kim, Solihin - Contention Model
Outline • Model Assumptions • Definitions • Inductive Probability (Prob) Model • Validation • Case Study • Conclusions Chandra, Guo, Kim, Solihin - Contention Model
seq(5,8) cseq(5,7) cseq(4,5) cseq(1,2) Definitions • seqX(dX,nX) = sequence of nX accesses to dX distinct addresses by a thread X to the same cache set • cseqX(dX,nX) (circular sequence) = a sequence in which the first and the last accesses are to the same address A B C D A E E B Chandra, Guo, Kim, Solihin - Contention Model
Circular Sequence Properties • Thread X runs alone in the system: • Given a circular sequence cseqX(dX,nX), the last access is a cache miss iff dX > Assoc • Thread X shares the cache with thread Y: • During cseqX(dX,nX)’s lifetime ifthere is a sequence of intervening accesses seqY(dY,nY), the last access of thread X is a miss iff dX+dY > Assoc Chandra, Guo, Kim, Solihin - Contention Model
Y’s intervening access sequence X’s circular sequence cseqX(2,3) lifetime A B A U V V W Example • Assume a 4-way associative cache: No cache sharing: A is a cache hit Cache sharing: is A a cache hit or miss? Chandra, Guo, Kim, Solihin - Contention Model
Y’s intervening access sequence X’s circular sequence cseqX(2,3) A B A U V V W Cache Hit Cache Miss Example • Assume a 4-way associative cache: A U B V V A W A U B V V W A seqY(3,4) intervening in cseqX’s lifetime seqY(2,3) intervening in cseqX’s lifetime Chandra, Guo, Kim, Solihin - Contention Model
Outline • Model Assumptions • Definitions • Inductive Probability Model • Validation • Case Study • Conclusions Chandra, Guo, Kim, Solihin - Contention Model
Inductive Probability Model • For each cseqX(dX,nX) of thread X • Compute Pmiss(cseqX): the probability of the last access is a miss • Steps: • Compute E(nY): expected number of intervening accesses from thread Y during cseqX’s lifetime • For each possible dY, compute P(seq(dY, E(nY)): probability of occurrence of seq(dY, E(nY)), • If dY + dX > Assoc, add to Pmiss(cseqX) • Misses = old_misses + ∑ Pmiss(cseqX) x F(cseqX) Chandra, Guo, Kim, Solihin - Contention Model
Computing P(seq(dY, E(nY))) • Basic Idea: • P(seq(d,n)) = A * P(seq(d-1,n)) + B * P(seq(d-1,n-1)) • Where A and B are transition probabilities • Detailed steps in paper seq(d,n) + 1 access to a distinct address + 1 access to a non-distinct address seq(d-1,n-1) seq(d,n-1) Chandra, Guo, Kim, Solihin - Contention Model
Outline • Model Assumptions • Definitions • Inductive Probability Model • Validation • Case Study • Conclusions Chandra, Guo, Kim, Solihin - Contention Model
Validation • SESC simulator • Detailed CMP + memory hierarchy • 14 co-schedules of benchmarks (Spec2K and Olden) • Co-schedule terminated when an app completes Chandra, Guo, Kim, Solihin - Contention Model
Validation Error = (PM-AM)/AM • Larger error happens when miss increase is very large • Overall, the model is accurate Chandra, Guo, Kim, Solihin - Contention Model
Other Observations • Based on how vulnerable to cache sharing impact: • Highly vulnerable (mcf, gzip) • Not vulnerable (art, apsi, swim) • Somewhat / sometimes vulnerable (applu, equake, perlbmk, mst) • Prediction error: • Very small, except for highly vulnerable apps • 3.9% (average), 25% (maximum) • Also small for different cache associativities and sizes Chandra, Guo, Kim, Solihin - Contention Model
Outline • Model Assumptions • Definitions • Inductive Probability Model • Validation • Case Study • Conclusions Chandra, Guo, Kim, Solihin - Contention Model
Case Study • Profile approx. by geometric progression F(cseq(1,*)) F(cseq(2,*)) F(cseq(3,*)) … F(cseq(A,*)) … Z Zr Zr2 … ZrA … • Z = amplitude • 0 < r < 1 = common ratio • Larger r larger working set • Impact of interfering thread on the base thread? • Fix the base thread • Interfering thread: vary • Miss frequency = # misses / time • Reuse frequency = # hits / time Chandra, Guo, Kim, Solihin - Contention Model
Base Thread: r = 0.5 (Small WS) • Base thread: • Not vulnerable to interfering thread’s miss frequency • Vulnerable to interfering thread’s reuse frequency Chandra, Guo, Kim, Solihin - Contention Model
Base Thread: r = 0.9 (Large WS) • Base thread: • Vulnerable to interfering thread’s miss and reuse frequency Chandra, Guo, Kim, Solihin - Contention Model
Outline • Model Assumptions • Definitions • Inductive Probability Model • Validation • Case Study • Conclusions Chandra, Guo, Kim, Solihin - Contention Model
Conclusions • New Inter-Thread cache contention models • Simple to use: • Input: circular sequence profiling per thread • Output: Number of misses per thread in co-schedules • Accurate • 3.9% average error • Useful • Temporal reuse patterns cache sharing impact • Future work: • Predict and avoid problematic co-schedules • Release the tool at http://www.cesr.ncsu.edu/solihin Chandra, Guo, Kim, Solihin - Contention Model