1 / 16

4. Assessing and Understanding Performance

4. Assessing and Understanding Performance. 4. Performance. 4.1 Introduction 4.2 CPU Performance and Its Factors 4.3 Evaluating Performance 4.4 Real Stuff: Two SPEC Benchmarks and the Performance of Recent Intel Processors 4.5 Fallacies and Pitfalls 4.6 Concluding Remarks

Download Presentation

4. Assessing and Understanding Performance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 4. Assessing and Understanding Performance

  2. 4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors 4.3 Evaluating Performance 4.4 Real Stuff: Two SPEC Benchmarks and the Performance of Recent Intel Processors 4.5 Fallacies and Pitfalls 4.6 Concluding Remarks 4.7 Historical Perspective and Further Reading 4.8 Exercises

  3. 4.1 Introduction • How to measure, report, and summarize performance Defining Performance • An analogy Figure 4.1 Back to chapter overview

  4. Performance of a Computer • Response time ( = execution time ) • The time between the start and completion of a task • Throughput • The total amount of a work done in a given time • Performance and execution time • Performancex = 1 / Execution timex • X is n times faster than Y

  5. Measuring Performance • Definitions of time • Wall-clock time = Response time = Elapsed time • Total time to complete a task • Including disk accesses, memory accesses, I/O activities, OS overhead and etc. • CPU execution time = CPU time • The time CPU spends computing for this task • CPU time = User CPU time + System CPU time • UNIX time command • 90.7u 12.9s 2:39 65% • Definitions of performance • System performance: based on elapsed time • CPU performance: based on user CPU time

  6. 4.2 CPU Performance and Its Factors • CPU execution time = CPU clock cycles x clock cycle time = CPU clock cycles / clock rate • Example: Improving Performance • Same instruction sets • Computer A : 4 GHz, 10 seconds • Computer B : ? GHz, 6 second • B requires 1.2 times as many clock cycles as A. Back to chapter overview

  7. [Answer] CPU timeA = CPU clock cyclesA / clock rateA 10 seconds = CPU clock cyclesA / (4 X 109 cycles/sec) CPU clock cyclesA = 10 sec. X 4 X 109 cycles/sec = 40 X 109 cycles CPU timeB = CPU clock cyclesB / clock rateB = 1.2 X CPU clock cyclesA / clock rateB 6 seconds = 1.2 X 40 X 109 cycles / clock rateB clock rateB = 1.2 X 40 X 109 cycles / 6 seconds = 8 GHz

  8. Hardware Software Interface • CPU clock cycles = IC x CPI • IC (Instruction Count) • Dependent on compilers and architectures • CPI (Cycles Per Instruction) • Dependent on implementations • Performance equation Execution Time = IC x CPI x clock cycle time = (IC x CPI) / clock rate

  9. Example: Using the Performance Equation • Same instruction set architecture, same program • Clock cycle timeA = 250ps, CPIA = 2.0 • Clock cycle timeB = 500ps, CPIB = 1.2 • Which is faster, and by how much ? [Answer] • Let I = instruction count for the program. • CPU timeA = ICA x CPIA x clock cycle timeA = I x 2.0 x 250 ps = 500 x I ps • CPU timeB = I x 1.2 x 500 ps = 600 x I ps • Then • Thus, A is 1.2 times faster than B for this program.

  10. The Big Picture

  11. Example: Comparing Code Segments • Which will be faster ? • What is the CPI for each sequence ?

  12. [Answer] • instruction count1 = 2 + 1 + 2 = 5 and instruction count2 = 4 + 1 + 1 = 6 Thus (1) executes fewer instructions. • CPU clock cycles1 = 2x1 + 1x2 + 2x3 = 10 and CPU clock cycles2 = 4x1 + 1x2 + 1x3 = 9 Thus (2) is faster. • CPI1 = CPU clock cycles1 / instruction count1 = 10 / 5 =2 CPI2 = 9 / 6 = 1.5 (2) has lower CPI.

  13. 4.3 Evaluating Performance • Benchmarking • The process of performance comparison for two or more systems by measurements • Benchmark • Programs specifically chosen to measure performance • A workload that the user hopes will predict the performance of the actual workload • Compiler tricks • Optimizations in either the architecture or compiler Back to chapter overview

  14. Compiler Tricks by IBM

  15. Comparing and Summarizing Performance • Difficulties with summarizing performance • A is 10 times faster than B for program 1. • B is 10 times faster than A for program 2. • Total execution time: A Consistent Summary Measure • AM: Arithmetic Mean = • Weighted arithmetic mean = Figure 4.4

  16. 4.6 Concluding Remarks • Three design criteria • High-performance design • Supercomputer and high-end server • Low-cost design • Embedded system • Cost/performance design • Desktop computer • Execution time of real program as the metrics Back to chapter overview

More Related