190 likes | 293 Views
Chapter 4:. 22343 - Computer Organization & Design. Assessing & Understanding Performance. Defining Performance. Task A. Task B. Calculate. Calculate. Calculate. Calculate. Save File. Read File. Save File. Read File. Time. t. 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
E N D
Chapter 4: 22343 - Computer Organization & Design Assessing & UnderstandingPerformance
Defining Performance Task A Task B Calculate Calculate Calculate Calculate Save File Read File Save File Read File Time t 0 1 2 3 4 5 6 7 8 9 10 • Response Time The time it takes to do a task Execution Time • Throughput The total amount of work done in a given time • Difference?
Defining Performance Task A Task B Calculate Calculate Calculate Calculate Save File Save File Read File Read File Time t 0 1 2 3 4 5 6 7 8 9 10 • Response Time The time it takes to do a task Execution Time • Throughput The total amount of work done in a given time • Difference?
Measuring Performance = CPU Time Calculate Calculate Calculate Calculate Save File Read File Save File Read File • Performance = 1 / Execution Time • Response Time = Wall-Clock Time = Elapsed Time • Processor Time • + Memory Access Time • + Disk and I/O Access Time • + Operating System Time, etc.
Measuring Performance = Execute user code = Call OS functions, e.g. malloc Elapsed time on an unloaded system CPU time T • CPU Time • User CPU Time • System CPU Time • System Performance: • CPU Performance: • Clock Cycles: • Clock Period • Clock Rate
CPU Performance e.g. Clocks × nanosecond Clocks e.g. ───────── GHz CPU-1: # of CPU clock cycles = sec × cycles/sec CPU-2: # of CPU clock cycles = × cycles Clock rate = cycles / seconds = GHz • Program CPU Execution Time = Number of CPU Clock Cycles × Clock Cycle Time Number of CPU Clock Cycles = ─────────────────── Clock Rate Exercise: A program takes 10 seconds to run on a 4 GHz CPU. The same program on another CPU would take 20% extra clock cycles, yet it finishes in 6 seconds. What is the other CPU clock rate?
CPU Performance # of instructions in a program = CPU-A: CPU Execution Time = × ×ps = ps CPU-B: CPU Execution Time = × ×ps = ps Computer A is ( / ) = times faster than B • Clocks Per Instruction, CPI The average number of clock cycles each instruction takes to execute. Exercise: Which computer is faster?
CPU Performance Seq1: CPU Execution Time = × + ×+ ×= cycles Seq2: CPU Execution Time = × + × + × = cycles Seq2 is / = times faster than Seq1 Seq1 average CPI = = cycles / instruction Seq2 average CPI = = cycles / instruction Exercise: Given 3 groups of instructions: A, B andC, it takes different clock cycles to execute an instruction within each group. Given the shown instruction mix, whichcode sequence is faster to execute?
Evaluating Performance • Workload Set of user programs to be executed. • Benchmark Program specifically chosen to measure performance. • Target Benchmarks form a workload that the user hopes will predict the performance of the actual workload. • Today Benchmarks are real applications, from various environments.
Evaluating Performance Performance B seconds ────────── = ──────── Performance A seconds Computer B = times faster than Computer A • Weighted Arithmetic Mean • Total Execution Time Which computer is faster? • Arithmetic Mean
SPEC Benchmarks • System Performance Evaluation Corporation • CPU Performance • Graphics/Workstations Performance • High Performance Computing • Java Client/Server • Mail Servers • Network File System • Power • SIP • Virtualization • Web Servers
SPEC CPU Benchmarks • SPEC CPU2006 Suite • CINT2006: 12 Integer Benchmarks • CFP2006: 17 Floating Point Benchmarks • Exercise • CPU • Memory Systems • Compilers (Fortran, C, C++) One benchmark has ½ million lines in C++
CPU Efficiency Core i7: 6073 Core 2 Duo: 2321 Pentium IV: 539 Pentium III: 152 Is the increase in performance due to higher clocks?
CPU Efficiency Normalized Scores: ─── = ─── = It takes more clocks New instructions; Streaming SIMD Ex2 CPI was sacrificed to enhance Clock rate. • Implementation Efficiency • Clock-Normalized Scores Example: Pentium 3 @ 800 MHz 152 Pentium 4 @ 3.4 GHz 539 Example:
Amdahl’s Law ─── = ─── + Not Possible • When introducing an improvement, Execution Time is divided into 2 parts: • Affected by the improvement • Not affected Execution Time Execution Time Affected Execution After = ──────────────── + Time Improvement Amount of Improvement Unaffected Example: How much improvement is required for the multiply hardware to make the program run 5 times faster?
MIPS: Million Instructions Per Second • No Regard to Instruction Type • Instructions Have Different Capabilities • Different Computers Have Different Architectures • Different MIPS for Different Programs, Same CPU • MIPS Can Vary Inversely With Performance Example: Which code is faster?
MIPS: Million Instructions Per Second CPU Clock Cycles1 = ( + + ) × = CPU Clock Cycles2 = ( + + ) × = Execution Time1 = / = seconds Execution Time2 = / = seconds MIPS1 = ( + + ) million instr / seconds = MIPS2 = ( + + ) million instr / seconds =
Chapter 4 The End