240 likes | 419 Views
An Analytical Model for CMPs. Spring 2003 ECE/CS 757 University of Wisconsin - Madison. Peter McClone Kim-Huei Low. Overview. Introduction Simulation Environment Analytical Model ### Results Future work Conclusion. Introduction. Performance limits of superscalar processors
E N D
An Analytical Model for CMPs Spring 2003 ECE/CS 757 University of Wisconsin - Madison Peter McClone Kim-Huei Low
Overview • Introduction • Simulation Environment • Analytical Model ### • Results • Future work • Conclusion
Introduction • Performance limits of superscalar processors • Relative chip area increases • Chip Multiprocessors (CMP) • Design by simulation data • Time constraint • Analytical Model
CPU0 … CPUN iL1 dL1 iL1 dL1 L2 Memory CMP System Model • Like Piranha • Multi-programmed workload • SimpleMP?
Processor Model • Most area-efficient model • Area vs performance • SimpleScalar is used
SimpleScalar • Generate address traces • Ignore instruction addresses • ~100% hit rate • Fetched directly from memory, no allocation in L2 • Very little interference in L2 • Performance
Simulator Combination • 1. Generate address traces using word size cache blocks at all levels and minimal cache sizes • 2. Fed into CacheTracer to simulate the cache interference at the L2 level and generate the statistics needed for the model computation • 3. Model is used to compute performance estimates for variations of all of the cache parameters
Analytical Model • A mathematical equation for IPC • Combination of observations made in research papers and from intuitive knowledge • Based mainly on a detailed cache model that was the focus of the project • Processor model is very simple
Analytical Model • 3 part processor model: M(C,t) = M(C,t)startup+ M(C,t)nonstationary + M(C,t)intrinsic • C is a cache configuration • t is a time granule of r references
Analytical Model • Startup effects are the number of unique blocks accessed in the first time granule, u(B) • The miss rate is u(B) divided by the total number of references. • M(C,t)startup = u(B) rt
Analytical Model • Nonstationary misses are caused by the change in the working set of a process • This is the difference between the total number of unique blocks accessed to this point U(B) and the blocks attributed to startup: • M(C,t)nonstationary = U(B) – u(B) rT • The sum of the startup and nonstationary misses are simply the number of unique blocks in the trace.
Analytical Model • Intrinsic misses are caused inherently caused because of program behavior • The natural sequence of accesses cause cache lines to displace each other • M(C,t)intrinsic = c * [u(B) - ∑D S*d* P(B,d) ] r d=0
Analytical Model • Intrinsic misses are caused inherently caused because of program behavior • The natural sequence of accesses cause cache lines to displace each other • M(C,t)intrinsic = c * [u(B) - ∑D S*d* P(B,d) ] r d=0 S is the number of sets
Analytical Model • Intrinsic misses are caused inherently caused because of program behavior • The natural sequence of accesses cause cache lines to displace each other • M(C,t)intrinsic = c * [u(B) - ∑D S*d* P(B,d) ] r d=0 P(B,d) is the probability that that d blocks of size B map into any cache set. S is the number of sets
Analytical Model • Intrinsic misses are caused inherently caused because of program behavior • The natural sequence of accesses cause cache lines to displace each other • M(C,t)intrinsic = c * [u(B) - ∑D S*d* P(B,d) ] r d=0 P(B,d) is the probability that that d blocks of size B map into any cache set. S is the number of sets The sum computes the probability that a cache set has any number fewer than D blocks that map to it
Analytical Model • Intrinsic misses are caused inherently caused because of program behavior • The natural sequence of accesses cause cache lines to displace each other • M(C,t)intrinsic = c * [u(B) - ∑D S*d* P(B,d) ] r d=0 P(B,d) is the probability that that d blocks of size B map into any cache set. S is the number of sets This many blocks cannot cause misses. Fewer blocks than the set size map to the set.
Analytical Model • Intrinsic misses are caused inherently caused because of program behavior • The natural sequence of accesses cause cache lines to displace each other • M(C,t)intrinsic = c * [u(B) - ∑D S*d* P(B,d) ] r d=0 • However the other blocks (u(B) – the sum) may cause misses. This is estimated by multiplying by the measure collision rate c P(B,d) is the probability that that d blocks of size B map into any cache set. S is the number of sets This many blocks cannot cause misses. Fewer blocks than the set size map to the set.
Analytical Model • Combining all three components of the miss rate results in the equation: M(C,t) = u(B) + U(B)–u(B) + c * [u(B)-∑D S*d* P(B,d)] rt rt r d=0 • Average memory access time can be trivially computed: AMA = (1 – ML1) * HitTimeL1 + (ML1 * (1 - ML2) * HitTimeL2 + (ML1* ML2) * HitTimemain
Analytical Model • The final model then incorporates the AMA into a simple processor model based on: • Issue Width (IW) • Issue Capabilities (IC) • Average memory penalty (AMP) • Average memory access time (AMA) • Pi = IWi * [(AMAi + 2)/3 * AMPi ] • Total performance is the sum of each processor • Pcmp = ∑Nc Pi i=1
Future work • Modify SimpleMP or RSIM • Modify SMP or DSM system to CMP system • Fix or create the multi-programmed loader • Incorporate processor parameters into model • Issue width, instruction window size, branch predictor policy, # of ALUs, etc.
Conclusion • ### • Questions?
References • [1] AGARWAL, A., HOROWITZ, M., AND HENNESSY, J. An analytical cache model. Computer Systems Lab. Rep. TR 86-304, Stanford Univ. Stanford, Calif., Sept. 1986 • [2] J. Huh, D. Burger, and S. W. Keckler. Exploring the design space of future CMPs. In The 10th International Conference on Parallel Architectures and Compilation Techniques, pages 199-210, September 2001. • [3] L. A. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. Piranha: A scalable architecture based on single-chip multiprocessing. In The 27th Annual International Symposium on Computer Architecture, pages282–293, June 2000.