160 likes | 380 Views
4. Assessing and Understanding Performance. 4. Performance. 4.1 Introduction 4.2 CPU Performance and Its Factors 4.3 Evaluating Performance 4.4 Real Stuff: Two SPEC Benchmarks and the Performance of Recent Intel Processors 4.5 Fallacies and Pitfalls 4.6 Concluding Remarks
E N D
4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors 4.3 Evaluating Performance 4.4 Real Stuff: Two SPEC Benchmarks and the Performance of Recent Intel Processors 4.5 Fallacies and Pitfalls 4.6 Concluding Remarks 4.7 Historical Perspective and Further Reading 4.8 Exercises
4.1 Introduction • How to measure, report, and summarize performance Defining Performance • An analogy Figure 4.1 Back to chapter overview
Performance of a Computer • Response time ( = execution time ) • The time between the start and completion of a task • Throughput • The total amount of a work done in a given time • Performance and execution time • Performancex = 1 / Execution timex • X is n times faster than Y
Measuring Performance • Definitions of time • Wall-clock time = Response time = Elapsed time • Total time to complete a task • Including disk accesses, memory accesses, I/O activities, OS overhead and etc. • CPU execution time = CPU time • The time CPU spends computing for this task • CPU time = User CPU time + System CPU time • UNIX time command • 90.7u 12.9s 2:39 65% • Definitions of performance • System performance: based on elapsed time • CPU performance: based on user CPU time
4.2 CPU Performance and Its Factors • CPU execution time = CPU clock cycles x clock cycle time = CPU clock cycles / clock rate • Example: Improving Performance • Same instruction sets • Computer A : 4 GHz, 10 seconds • Computer B : ? GHz, 6 second • B requires 1.2 times as many clock cycles as A. Back to chapter overview
[Answer] CPU timeA = CPU clock cyclesA / clock rateA 10 seconds = CPU clock cyclesA / (4 X 109 cycles/sec) CPU clock cyclesA = 10 sec. X 4 X 109 cycles/sec = 40 X 109 cycles CPU timeB = CPU clock cyclesB / clock rateB = 1.2 X CPU clock cyclesA / clock rateB 6 seconds = 1.2 X 40 X 109 cycles / clock rateB clock rateB = 1.2 X 40 X 109 cycles / 6 seconds = 8 GHz
Hardware Software Interface • CPU clock cycles = IC x CPI • IC (Instruction Count) • Dependent on compilers and architectures • CPI (Cycles Per Instruction) • Dependent on implementations • Performance equation Execution Time = IC x CPI x clock cycle time = (IC x CPI) / clock rate
Example: Using the Performance Equation • Same instruction set architecture, same program • Clock cycle timeA = 250ps, CPIA = 2.0 • Clock cycle timeB = 500ps, CPIB = 1.2 • Which is faster, and by how much ? [Answer] • Let I = instruction count for the program. • CPU timeA = ICA x CPIA x clock cycle timeA = I x 2.0 x 250 ps = 500 x I ps • CPU timeB = I x 1.2 x 500 ps = 600 x I ps • Then • Thus, A is 1.2 times faster than B for this program.
Example: Comparing Code Segments • Which will be faster ? • What is the CPI for each sequence ?
[Answer] • instruction count1 = 2 + 1 + 2 = 5 and instruction count2 = 4 + 1 + 1 = 6 Thus (1) executes fewer instructions. • CPU clock cycles1 = 2x1 + 1x2 + 2x3 = 10 and CPU clock cycles2 = 4x1 + 1x2 + 1x3 = 9 Thus (2) is faster. • CPI1 = CPU clock cycles1 / instruction count1 = 10 / 5 =2 CPI2 = 9 / 6 = 1.5 (2) has lower CPI.
4.3 Evaluating Performance • Benchmarking • The process of performance comparison for two or more systems by measurements • Benchmark • Programs specifically chosen to measure performance • A workload that the user hopes will predict the performance of the actual workload • Compiler tricks • Optimizations in either the architecture or compiler Back to chapter overview
Comparing and Summarizing Performance • Difficulties with summarizing performance • A is 10 times faster than B for program 1. • B is 10 times faster than A for program 2. • Total execution time: A Consistent Summary Measure • AM: Arithmetic Mean = • Weighted arithmetic mean = Figure 4.4
4.6 Concluding Remarks • Three design criteria • High-performance design • Supercomputer and high-end server • Low-cost design • Embedded system • Cost/performance design • Desktop computer • Execution time of real program as the metrics Back to chapter overview