An Analytical Model for CMPs

An Analytical Model for CMPs Spring 2003 ECE/CS 757 University of Wisconsin - Madison Peter McClone Kim-Huei Low

Overview • Introduction • Simulation Environment • Analytical Model ### • Results • Future work • Conclusion

Introduction • Performance limits of superscalar processors • Relative chip area increases • Chip Multiprocessors (CMP) • Design by simulation data • Time constraint • Analytical Model

CPU0 … CPUN iL1 dL1 iL1 dL1 L2 Memory CMP System Model • Like Piranha • Multi-programmed workload • SimpleMP?

CacheTracer

Processor Model • Most area-efficient model • Area vs performance • SimpleScalar is used

SimpleScalar • Generate address traces • Ignore instruction addresses • ~100% hit rate • Fetched directly from memory, no allocation in L2 • Very little interference in L2 • Performance

Simulator Combination • 1. Generate address traces using word size cache blocks at all levels and minimal cache sizes • 2. Fed into CacheTracer to simulate the cache interference at the L2 level and generate the statistics needed for the model computation • 3. Model is used to compute performance estimates for variations of all of the cache parameters

Analytical Model • A mathematical equation for IPC • Combination of observations made in research papers and from intuitive knowledge • Based mainly on a detailed cache model that was the focus of the project • Processor model is very simple

Analytical Model • 3 part processor model: M(C,t) = M(C,t)startup+ M(C,t)nonstationary + M(C,t)intrinsic • C is a cache configuration • t is a time granule of r references

Analytical Model • Startup effects are the number of unique blocks accessed in the first time granule, u(B) • The miss rate is u(B) divided by the total number of references. • M(C,t)startup = u(B) rt

Analytical Model • Nonstationary misses are caused by the change in the working set of a process • This is the difference between the total number of unique blocks accessed to this point U(B) and the blocks attributed to startup: • M(C,t)nonstationary = U(B) – u(B) rT • The sum of the startup and nonstationary misses are simply the number of unique blocks in the trace.

Analytical Model • Intrinsic misses are caused inherently caused because of program behavior • The natural sequence of accesses cause cache lines to displace each other • M(C,t)intrinsic = c * [u(B) - ∑D S*d* P(B,d) ] r d=0

Analytical Model • Intrinsic misses are caused inherently caused because of program behavior • The natural sequence of accesses cause cache lines to displace each other • M(C,t)intrinsic = c * [u(B) - ∑D S*d* P(B,d) ] r d=0 S is the number of sets

Analytical Model • Intrinsic misses are caused inherently caused because of program behavior • The natural sequence of accesses cause cache lines to displace each other • M(C,t)intrinsic = c * [u(B) - ∑D S*d* P(B,d) ] r d=0 P(B,d) is the probability that that d blocks of size B map into any cache set. S is the number of sets

Analytical Model • Intrinsic misses are caused inherently caused because of program behavior • The natural sequence of accesses cause cache lines to displace each other • M(C,t)intrinsic = c * [u(B) - ∑D S*d* P(B,d) ] r d=0 P(B,d) is the probability that that d blocks of size B map into any cache set. S is the number of sets The sum computes the probability that a cache set has any number fewer than D blocks that map to it

Analytical Model • Intrinsic misses are caused inherently caused because of program behavior • The natural sequence of accesses cause cache lines to displace each other • M(C,t)intrinsic = c * [u(B) - ∑D S*d* P(B,d) ] r d=0 P(B,d) is the probability that that d blocks of size B map into any cache set. S is the number of sets This many blocks cannot cause misses. Fewer blocks than the set size map to the set.

Analytical Model • Intrinsic misses are caused inherently caused because of program behavior • The natural sequence of accesses cause cache lines to displace each other • M(C,t)intrinsic = c * [u(B) - ∑D S*d* P(B,d) ] r d=0 • However the other blocks (u(B) – the sum) may cause misses. This is estimated by multiplying by the measure collision rate c P(B,d) is the probability that that d blocks of size B map into any cache set. S is the number of sets This many blocks cannot cause misses. Fewer blocks than the set size map to the set.

Analytical Model • Combining all three components of the miss rate results in the equation: M(C,t) = u(B) + U(B)–u(B) + c * [u(B)-∑D S*d* P(B,d)] rt rt r d=0 • Average memory access time can be trivially computed: AMA = (1 – ML1) * HitTimeL1 + (ML1 * (1 - ML2) * HitTimeL2 + (ML1* ML2) * HitTimemain

Analytical Model • The final model then incorporates the AMA into a simple processor model based on: • Issue Width (IW) • Issue Capabilities (IC) • Average memory penalty (AMP) • Average memory access time (AMA) • Pi = IWi * [(AMAi + 2)/3 * AMPi ] • Total performance is the sum of each processor • Pcmp = ∑Nc Pi i=1

Results

Future work • Modify SimpleMP or RSIM • Modify SMP or DSM system to CMP system • Fix or create the multi-programmed loader • Incorporate processor parameters into model • Issue width, instruction window size, branch predictor policy, # of ALUs, etc.

Conclusion • ### • Questions?

References • [1] AGARWAL, A., HOROWITZ, M., AND HENNESSY, J. An analytical cache model. Computer Systems Lab. Rep. TR 86-304, Stanford Univ. Stanford, Calif., Sept. 1986 • [2] J. Huh, D. Burger, and S. W. Keckler. Exploring the design space of future CMPs. In The 10th International Conference on Parallel Architectures and Compilation Techniques, pages 199-210, September 2001. • [3] L. A. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. Piranha: A scalable architecture based on single-chip multiprocessing. In The 27th Annual International Symposium on Computer Architecture, pages282–293, June 2000.

An Analytical Model for CMPs

An Analytical Model for CMPs

Presentation Transcript

An XML Data Model for Analytical Instruments

Writing an Analytical Essay

An Analytical Model for Negative Bias Temperature Instability NBTI

An Analytical Model for Multi-tier Internet Services and its Applications

Compartment model based analytical PET simulator for PVELab

An Analytical Summary

Writing an analytical essay

Writing an Analytical Essay

Writing an Analytical Paragraph

Developing an Analytical Essay

Sociology: An Analytical Core

An Analytical Model Relating FPGA Architecture Parameters to Routability

An Analytical Model for a GPU

An Analytical Solution for “EIT Waves”

Implementation of an analytical model accounting for sample inhomogeneities in REFIT

Cache coherence for CMPs

Token Coherence for CMPs

An Analytical Model to Exploit Memory Task Scheduling

An analytical model for ATLAS

An Analytical Approach for the Two-Tier Resource Management Model

The Analytical Model

An XML Data Model for Analytical Instruments