460 likes | 494 Views
CS-447– Computer Architecture M,W 10-11:20am Lecture 7 Performance (Cont’d). September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu www.qatar.cmu.edu/~msakr/15447-f07/. Today. Lecture & Discussion Next Lecture: Review. Done by now. Read the chapters & slides.
E N D
CS-447– Computer Architecture M,W 10-11:20amLecture 7Performance (Cont’d) September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu www.qatar.cmu.edu/~msakr/15447-f07/
Today • Lecture & Discussion • Next Lecture: Review Done by now • Read the chapters & slides. • Practice the performance examples in the Patterson book.
Assessing & Understanding Performance This chapter discusses how to measure, report, and summarize performance of a computer.
Motivation It is often helpful to have some yardstick by which to compare systems • During development to evaluate different algorithms or optimizations • During purchasing to compare between product offerings • …
Performance • Measure, Report, and Summarize • Make intelligent choices • See through the marketing hype • Key to understanding underlying organizational motivation
Performance Why is some hardware better than others for different programs?What factors of system performance are hardware related?(e.g., Do we need a new machine, or a new operating system?)How does the machine's instruction set affect performance?
Which of these airplanes has the best performance? Airplane Passengers Range (mi) Speed (mph) Boeing 737-100 101 630 598 Boeing 747 470 4150 610 BAC/Sud Concorde 132 4000 1350 Douglas DC-8-50 146 8720 544 • How much faster is the Concorde compared to the 747? • How much bigger is the 747 than the Douglas DC-8?
Computer Performance • Response Time (latency) — How long does it take for my job to run? — How long does it take to execute a job? — How long must I wait for the database query? • Throughput — How many jobs can the machine run at once? — What is the average execution rate? — How much work is getting done?
Execution Time • Elapsed Time • counts everything (disk and memory accesses, I/O , etc.) • a useful number, but often not good for comparison purposes
Execution Time • CPU time • doesn't count I/O or time spent running other programs • can be broken up into system time, and user time • Our focus: user CPU time • time spent executing the lines of code that are "in" our program
Definition of Performance • For some program running on machine X, PerformanceX = 1 / Execution timeX "X is n times faster than Y" PerformanceX / PerformanceY = n
Definition of Performance Problem: • machine A runs a program in 20 seconds • machine B runs the same program in 25 seconds
Comparing and Summarizing Performance How to compare the performance? Total Execution Time : A Consistent Summary Measure
time Clock Cycles • Instead of reporting execution time in seconds, we often use cycles • Clock “ticks” indicate when to start activities (one abstraction):
Clock cycles • cycle time = time between ticks = seconds per cycle • clock rate (frequency) = cycles per second (1 Hz = 1 cycle/sec)A 4 Ghz clock has a 250ps cycle time
CPU execution time for a program = (CPU clock cycles for a program) x (clock cycle time) Seconds Cycles Seconds = ´ Program Program Cycle cycles cycle = / Program sec onds = cycle / sec onds clock rate CPU Execution Time
How to Improve Performance So, to improve performance (everything else being equal) you can either increase or decrease?________ the # of required cycles for a program, or________ the clock cycle time or, said another way, ________ the clock rate.
How to Improve Performance So, to improve performance (everything else being equal) you can either increase or decrease?_decrease_ the # of required cycles for a program, or_decrease_ the clock cycle time or, said another way, _increase_ the clock rate.
1st instruction 2nd instruction 3rd instruction ... 4th 5th 6th time How many cycles are required for a program? Could assume that # of cycles equals # of instruction This assumption is incorrect, different instructions take different amounts of time on different machines.
Different numbers of cycles for different instructions • Multiplication takes more time than addition • Floating point operations take longer than integer ones • Accessing memory takes more time than accessing registers • Important point: changing the cycle time often changes the number of cycles required for various instructions time
CPI CPU clock cycles = Instructions for a program x Average clock cycles per Instruction (CPI) CPU time = Instruction count x CPI x clock cycle time
Performance • Performance is determined by execution time • Do any of the other variables equal performance? • # of cycles to execute program? • # of instructions in program? • # of cycles per second? • average # of cycles per instruction? • average # of instructions per second? • Common pitfall: thinking one of the variables is indicative of performance when it really isn’t.
CPU Clock Cycles CPIi : the average number of cycles per instructions for that instruction class Ci : the count of the number of instructions of class i executed. n : the number of instruction classes.
Example • Instruction Classes: • Add • Multiply • Average Clock Cycles per Instruction: • Add 1cc • Mul 3cc • Program A executed: • 10 Add instructions • 5 Multiply instructions
Quiz An application using a desktop client and a remote server is limited by network performance. What happens to response time and throughput when: • An extra network channel is added • Networking software is upgraded to reduce communications delay • More memory is added to the desktop computer
Formula Summary • T: Execution Time (seconds) • C: Total Number of Cycles • f: Clock Frequency (cycles/second) • I: (Dynamic) Instruction Count • Ij: Count for Instructions of type j • Cj: Cycles per Instruction of type j T = C / f C = I1 x C1 + … + Ik x Ck I = I1 + I2 + … + Ik CPI = C / I T = (I x CPI) / f
Performance Calculation Example: fact(4) • fact: • pushl %ebp # Setup • movl %esp,%ebp # Setup • movl $1,%eax # eax = 1 • movl 8(%ebp),%edx # edx = x • L11: • imull %edx,%eax # result *= x • decl %edx # x— • cmpl $1,%edx # Compare x:1 • jg L11 # if > repeat • movl %ebp,%esp # Finish • popl %ebp # Finish • ret # Finish f = 1GHz Calculate: T, C, I, & CPI when fact is executed with input x = 4
Performance Calculation Example: fact(4) • fact: • pushl %ebp • movl %esp,%ebp • movl $1,%eax • movl 8(%ebp),%edx • L11: • imull %edx,%eax • decl %edx • cmpl $1,%ed • jg L11 • movl %ebp,%esp • popl %ebp • ret
Benchmarks • Performance best determined by running a real application • Use programs typical of expected workload • Or, typical of expected class of applicationsex: compilers/editors, scientific applications, graphics • Small benchmarks • nice for architects and designers • easy to standardize • can be abused
Benchmarks (2) • SPEC (Standard Performance Evaluation Corporation) • companies have agreed on a set of real programs and inputs • valuable indicator of performance (and compiler technology) • can still be abused
Standard Performance Evaluation Corporation • SPEC is supported by a number of computer vendors to create standard sets of benchmarks for modern computer systems. • The SPEC benchmark sets include CPU performance, graphics, High-performance computing, Object-oriented computing, Java applications, Client-server models, Mail systems, File systems, and Web servers.
SPEC ‘89 • Compiler “enhancements” and performance
SPEC CPU Benchmarks CINT2000 : the SPEC ratio for the integer benchmark sets CFP2000 : the SPEC ratio for the floating-point benchmark sets.
SPEC 2000 Does doubling the clock rate double the performance? Can a machine with a slower clock rate have better performance?
SPEC 2000 Does doubling the clock rate double the performance? Can a machine with a slower clock rate have better performance?
Amdahl's Law Execution Time After Improvement = Execution Time Unaffected +( Execution Time Affected / Amount of Improvement )
Example • Application execution time = 20sec • 12 seconds are spent performing add operations • If we improve the add operation to run twice as fast, how much faster will the application run?
Amdahl’s Law • Example:"Suppose a program runs in 100 seconds on a machine, with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster?"
Amdahl's Law Execution time after improvement
MIPS (million instructions per second) Example • Which code sequence will execute faster according to MIPS? • According to execution time?
10 10 9 = = Execution time1 2 . 5 seconds ´ 4 10 9 ´ 15 10 9 = = Execution time2 3 . 75 seconds ´ 4 10 9 Execution time & MIPS CPU clock cycles1 = (5 x 1+1 x 2+1 x 3) x 109 = 10 x 109 CPU clock cycles2 = (10 x 1+1 x 2+1 x 3) x 109 = 15 x 109 ´
Performance Evaluation • Performance depends on • Hardware architecture • Software environment • Meaning of performance depends on viewpoint • User: time • System Manager: throughput
Performance Evaluation • Kinds of Performance • Graphics • Network • Transactional • Multi-user system • I/O • Scientific/Engineering codes
Example on the MIPS R10K Prof run at: Tue Apr 28 15:50:26 1998 Command line: prof suboptim.ideal.m28293 109148754: Total number of cycles 0.55974s: Total execution time 77660914: Total number of instructions executed 1.405: Ratio of cycles / instruction 195: Clock rate in MHz R10000: Target processor modelled cycles(%) cum % secs instrns calls procedure 61901843(56.71) 56.71 0.32 45113360 1 pdot 47212563(43.26) 99.97 0.24 32523280 1 init 31767( 0.03) 100.00 0.00 21523 1 vsum 1069( 0.00) 100.00 0.00 887 3 fflush : : : : : :