220 likes | 420 Views
Benchmarks. Programs specifically chosen to measure performance Must reflect typical workload of the user Benchmark types Real applications Small benchmarks Benchmark suites Synthetic benchmarks. Real Applications. Workload: Set of programs a typical user runs day in and day out .
E N D
Benchmarks • Programs specifically chosen to measure performance • Must reflect typical workload of the user • Benchmark types • Real applications • Small benchmarks • Benchmark suites • Synthetic benchmarks
Real Applications • Workload: Set of programs a typical user runs day in and day out. • To use these real applications for metrics is a direct way of comparing the execution time of the workload on two machines. • Using real applications for metrics has certain restrictions: • They are usually big • Takes time to port to different machines • Takes considerable time to execute • Hard to observe the outcome of a certain improvement technique
Comparing & Summarizing Performance • A is 100 times faster than B for program 1 • B is 10 times faster than A for program 2 • For total performance, arithmetic mean is used:
Arithmetic Mean • If each program, in the workload, are not run equal # times, then we have to use weighted arithmetic mean: • Suppose that the program 1 runs 10 times as often as the program 2. Which machine is faster?
Small Benchmarks • Small code segments which are common in many applications • For example, loops with certain instruction mix • for (j = 0; j<8; j++) S = S + Aj Bi-j • Good for architects and designers • Since small code segments are easy to compile and simulate even by hand, designers use these kind of benchmarks while working on a novel machine • Can be abused by compiler designers by introducing special-purpose optimizations targeted at specific benchmark.
Benchmark Suites • SPEC (Standard Performance Evaluation Corporation) • Non-profit organization that aims to produce "fair, impartial and meaningful benchmarks for computers” • Began in 1989 - SPEC89 (CPU intensive) • Companies agreed on a set of real programs and inputs which they hope reflect a typical user’s workload best. • Valuable indicator of performance • Can still be abused • Updates are required as the applications and their workload change by time
SPEC Benchmark Sets • CPU Performance (SPEC CPU2006) • Graphics (SPECviewperf) • High-performance computing (HPC2002, MPI2007, OMP2001) • Java server applications (jAppServer2004) • a multi-tier benchmark for measuring the performance of Java 2 Enterprise Edition (J2EE) technology-based application servers. • Mail systems (MAIL2001, SPECimap2003) • Network File systems (SFS97_R1 (3.0)) • Web servers (SPEC WEB99, SPEC WEB99 SSL) • More information: http://www.spec.org/
SPEC CPU2006 – Summarizing • SPEC ratio: the execution time measurements are normalized by dividing the measured execution time by the execution time on a reference machine • Sun Microsystems Fire V20z, which has anAMD Opteron 252 CPU, running at 2600 MHz. • 164.gzip benchmark executes in 90.4 s. • The reference time for this benchmark is 1400 s, • benchmark is 1400/90.4 × 100 = 1548 (a unitless value) • Performances of different programs in the suites are summarized using “geometric mean” of SPEC ratios.
Comparing Pentium III and Pentium 4 Implementation efficiency?
Power Consumption Concerns • Performance studied at different levels: • Maximum power • Intermediate level that conserves battery life • Minimum power that maximizes battery life • Intel Mobile Pentium & Pentium M: two available clock rates • Maximum • Reduced clock rate • Pentium M @ 1.6/0.6 GHz • Pentium 4-M @ 2.4/1.2 GHz • Pentium III-M @ 1.2/0.8 GHz
Synthetic Benchmarks • Artificial programs constructed to try to match the characteristics of a large set of program. • Goal: Create a single benchmark program where the execution frequency of instructions in the benchmark simulates the instruction frequency in a large set of benchmarks. • Examples: • Dhrystone, Whetstone • They are not real programs • Compiler and hardware optimizations can inflate the improvement far beyond what the same optimization would do with real programs
Amdahl’s Law in Computing • Improving one aspect of a machine by a factor of n does not improve the overall performance by the same amount. • Speedup = (Performance after imp.) / (Performance before imp.) • Speedup = (Execution time before imp.)/(Execution time after imp.) • Execution Time After Improvement = Execution Time Unaffected +(Execution Time Affected/n)
Amdahl’s Law • Example: Suppose a program runs in 100 s on a machine, with multiplication responsible for 80 s of this time. • How much do we have to improve the speed of multiplication if we want the program to run 4 times faster? • Can we improve the performance by a factor 5?
Amdahl’s Law • The performance enhancement possible due to a given improvement is limited by the amount that the improved feature is used. • In previous example, it makes sense to improve multiplication since it takes 80% of all execution time. • But after certain improvement is done, the further effort to optimize the multiplication more will yield insignificant improvement. • Law of Diminishing Returns • A corollary to Amdahl’s Law is to make a common case faster.
Examples • Suppose we enhance a machine making all floating-point instructions run five times faster. If the execution time of some benchmark before the floating-point enhancement is 10 seconds, what will the speedup be if half of the 10 seconds is spent executing floating-point instructions? • We are looking for a benchmark to show off the new floating-point unit described above, and want the overall benchmark to show a speedup of 3. One benchmark we are considering runs for 90 seconds with the old floating-point hardware. How much of the execution time would floating-point instructions have to account for in this program in order to yield our desired speedup on this benchmark?
Remember • Total execution time is a consistent summary of performance • Execution Time = (IC CPI)/f • For a given architecture, performance increases come from: • increases in clock rate (without too much adverse CPI effects) • improvements in processor organization that lower CPI • compiler enhancements that lower CPI and/or IC