190 likes | 365 Views
CS1104: Computer Organisation http://www.comp.nus.edu.sg/~cs1104. School of Computing National University of Singapore. PII Lecture 3: Benchmarking. Definitions. SPEC ’95. Amdahl’s Law. Reading: Chapter 2 of Patterson’s book: 2.4 – 2.7. Benchmarking.
E N D
CS1104: Computer Organisation http://www.comp.nus.edu.sg/~cs1104 School of Computing National University of Singapore
PII Lecture 3: Benchmarking • Definitions. • SPEC ’95. • Amdahl’s Law. • Reading: Chapter 2 of Patterson’s book: 2.4 – 2.7. Benchmarking
Benchmarking • Benchmarking:Choosing programs to evaluate performance. • Measure the performance of a machine using a set of programs which will hopefully emulate the workload generated by the user’s programs. • Benchmarks: programs designed to measure performance. Benchmarking
very specific • non-portable • difficult to run or measure • hard to identify cause • representative • portable • widely used • improvements useful in reality • less representative • easy to run, early in design cycle • easy to “fool” • identify peak capability and potential bottlenecks • “peak” may be a long way from application performance Benchmarks Pros Cons Actual Target Workload Full Application Benchmarks Small “Kernel” Benchmarks Microbenchmarks Benchmarking
SPEC ’95 • SPEC (System Performance Evaluation Cooperative) • Companies have agreed on a set of real program and inputs • Eighteen application benchmarks (with inputs) reflecting a technical computing workload • Eight integer • go, m88ksim, gcc, compress, li, ijpeg, perl, vortex • Ten floating-point intensive • tomcatv, swim, su2cor, hydro2d, mgrid, applu, turb3d, apsi, fppp, wave5 • Must run with standard compiler flags • Eliminate special undocumented incantations that may not even generate working code for real programs • Can still be abused (Intel’s “other” bug) • Valuable indicator of performance (and compiler technology) Benchmarking
SPEC ’95 (2) Benchmarking
SPEC ’95 (3) • For a given ISA, increases in CPU performance can come from three sources: • Increase in clock rate • Improvements in processor organization that lower that CPI • Compiler enhancements that lower the instruction count or generate instructions with a lower average CPI (e.g., by using simpler instructions) • Next slide shows the SPECint95 and SPECfp95 measurements for a series of Intel Pentium processors and Pentium Pro processors. • Does doubling the clock rate double performance? • Can a machine with a slower clock rate have better performance? Benchmarking
SPEC ’95 (4) • At same clock rate, Pentium Pro is 1.4 to 1.5 times faster (for SPECint95) and 1.7 to 1.8 times faster (for SPECfp95) – improvements come from organizational enhancements (pipelining, memory system) to the Pentium Pro. • Performance increases at a slower rate than increase in clock rate – bottleneck at memory system, Amdahl’s law at play here. Benchmarking
Amdahl’s Law • Pitfall: Expecting the improvement of one aspect of a machine to increase performance by an amount proportional to the size of the improvement. • Example: • Suppose a program runs in 100 seconds on a machine, with multiply operations responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster? 100 (total time) = 80 (for multiply) + UA (unaffected) 100/4 (new total time) = 80/Speedup (for multiply) + UA Speedup = 80/5 = 16(meaning multiply now takes only 5 seconds) Benchmarking
Amdahl’s Law (2) • Example (continued) • How about making it 5 times faster? 100 (total time) = 80 (for multiply) + UA (unaffected) 100/5 (new total time) = 80/Speedup (for multiply) + UA Speedup = 80/0 = ??? (impossible!) • There is no way we can enhance multiply to achieve a fivefold increase in performance, if multiply accounts for only 80% of the workload. Benchmarking
Amdahl’s Law (3) • This concept is the Amdahl’s law. Performance is limited to the non-speedup portion of the program. • Execution time after improvement = Execution time of unaffected part + (Execution time of affected part / Speedup) • Corollary of Amdahl’s law: Make the common case fast. Benchmarking
Example 1 • Suppose we enhance a machine making all floating-point instructions run five times faster. If the execution time of some benchmark before the floating-point enhancement is 12 seconds, what will the speedup be if half of the 12 seconds is spent executing floating-point instructions? Time = 6 (UA) + 6 (fl-pt) / 5 = 7.2 seconds. Speedup = 12/7.2 = 1.67 Benchmarking
Example 2 • We are looking for a benchmark to show off the new floating-point unit described in the previous example, and we want the overall benchmark to show a speedup of 3. One benchmark we are considering runs for 100 seconds with the old floating-point hardware. How much of the execution time would floating-point instructions have to account for in this program in order to yield our desired speedup on this benchmark? Speedup = 3 = 100 / (Time_FI / 5 + 100 – Time_Fl) Time_FI = 83.33 seconds Benchmarking
Sample Question (1) • Which of the following is/are true for SPEC benchmarking? • Higher SPEC number implies that the clock speed must be faster. • SPEC number is an indication of the performance of the processor hardware only. • SPEC benchmark is a single program used to measure and compare computer system performance. • All of the above. • None of the above. [Answer] Benchmarking
Sample Question (2) • Which of the following is/are true for SPEC benchmarking? • The SPEC benchmark consists of 18 actual client-target workload programs that every computer system should optimize for. • The overall SPEC number of a system is affected by the quality of the compiler. • A system that has a higher clock speed must always have a higher SPEC number. • A system that has a lower overall CPI for SPEC benchmarks must always have a higher SPEC number. • None of the above. [Answer] Benchmarking
Sample Question (3) • Suppose a program runs in 8642 seconds on a machine, with “rotate” operation responsible for 4321 seconds. How much do we have to improve the speed of “rotate” operation if we want the program to run 4 times faster? • Insufficient data to determine. • The speed of the “rotate” operation is improved by a factor of 4. • The speed of the “rotate” operation is improved by a factor of 8. • It is impossible to achieve the proposed speedup. • None of the above. [Answer] Benchmarking
Sample Question (4) • Which of the following is true for benchmarking? • Benchmarking is a mechanism to compare the relative performance of computer systems. • Small kernel is always the best choices for benchmarking the actual performance of a computer system because it is easy to run. • Actual workload of a targeted application is always the best choice for benchmarking the performance of a future system to be designed. • All (a), (b), (c). • None of the above. [Answer] Benchmarking