330 likes | 509 Views
SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS. Presented by Ankit Patel. Authors: Stijn Everman Lieven Eeckhout. Summary of this paper. Creates theoretical foundation for performance measurement of a given system, from a mathematical standpoint
E N D
SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout
Summary of this paper • Creates theoretical foundation for performance measurement of a given system, from a mathematical standpoint • From whose perspective should we measure the performance of a given system? • User • System • Combination of both
Current performance measurement • Researchers have reached the consensus that the performance metric of choice for assessing a single program’s performance is its execution time • For single-threaded programs, execution time is proportional to CPI (Cycles Per Instructions) or inversely proportional to IPC (Instructions Per Cycle)
Performance for multithreaded programs • Only CPI calculation is poor performance metrics. • It should use total execution time while measuring performance.
System-level performance criteria • The criteria for evaluating multiprogram computer systems are based on the user’s perspective and the system’s perspective. • What is User’s perspective? • How fast a single program is executed • What is system’s perspective? • Throughput
Terminologies • Turnaround time: Quantifies the time between submitting a job and its completion. • Response time: Measures the time between submitting a job and receiving its first response; this metric is important for interactive applications. • Throughput: quantifies the number of programs completed per unit of time. at
Continues…. • Single-program mode: A single program has exclusive access to the computer system. It has all system resources at its disposal and is never interrupted or preempted during its execution. • Multiprogram mode: Multiple programs are coexecuting on the computer system.
Turnaround Time • Normalized Turnaround Time(NTT): • Average NTT • Max NTT
System throughput • Normalized Progress: • System Throughput
Practical (Why I say practical???) • Adjusted ANTT: • Adjusted STP:
IPC Throughput (…keep this in mind…): • Weighted Speedup: • Harmonic Average (Hmean):
Co-executing programs in multiprogram mode experience equal relative progress with respect to single-program mode • Fairness: • Proportional Progress (for different priorities):
Enough theories……..How can I apply this in real world performance measurements?
Case study: Evaluating SMT fetch policies • What should be used in performance measurements? • Researchers should use multiple metrics for characterizing multiprogram system performance. • Combination of ANTT and STP provides a clear picture of overall system performance as a balance between user-oriented program turnaround time and system-oriented throughput. • Involves user level single-threaded workloads, does not affect the general applicability of the multiprogram performance metrics. • ANTT-STP characterization is applicable to multithreaded and full-system workloads. • Used ANTT and STP metrics to evaluate performance and for multithreaded full-system workloads, used the cycle-count-based equations.
Six SMT fetch policies • Icount: • Strive to have an equal # of instructions from all co-executing programs • Stall fetch: • Stalls the fetch of a program that experiences a long-latency load until data returns from memory. • Predictive stall fetch: • Extends the stall fetch policy by predicting long-latency loads in the front-end pipeline • MLP-aware stall fetch: • Predicts long latency loads and their associated memory-level parallelism • Flush: • Flushes on long-latency loads • MLP-aware flush: • Extends the MLP aware stall fetch policy by flushing instructions if more than m instructions have been fetched since the first burst of long-latency loads.
Simulation environment • Software used: SimPoint • 36 two program workload • 30 four program workload • Simulation points are shosen for SPEC 2000 benchmarks (200 million instructions each) • Four-wide superscaler, out-of-order SMT processor with an aggressive hardware data prefetcher with eight stream buffers
MLPaware flush policy outperforms Icount for both the two- and four-program workloads • That is, it achieves a higher system throughput and a lower average normalized turnaround time, while achieving a comparable fairness level.
The same is true when we compare MLP-aware flush against flush for the two-program workloads; for the four-program workloads, MLP-aware flush achieves a much lower normalized turnaround time than flush at a comparable system throughput. • MLP-aware stall fetch achieves a smaller ANTT, whereas predictive stall fetch achieves a higher STP.
What does this show? • Delicate balance between user-oriented and system-oriented views of performance. • If user-perceived performance is the primary objective, MLP-aware stall fetch is the better fetch policy. • If system perceived performance is the primary objective, predictive stall fetch is the policy of choice.
While I was introducing terminologies, IPC throughput, I said……keep this in mind…….remember?
IPC Throughput as performance measurement is misleading Using IPC throughput as a performance metric, you would conclude that the MLP-aware flush policy is comparable to the flush policy. However, it achieves a significantly higher system throughput (STP). Thus, IPC throughput is a potentially misleading performance metric.
Summary • Gives theoretical foundation for measuring system performance • Don’t judge the system performance for multicore systems merely based on IPC throughput or CPI • Use quantitative approach for performance measurements for multicore systems. Few of those are mentioned in this paper