CHAPTER 2

CHAPTER 2 THE ROLE OF PERFORMANCE

Performance • Measure, Report, and Summarize • Make intelligent choices • Why is some hardware better than others for different programs?What factors of system performance are hardware related? (e.g., Do we need a new machine, or a new operating system?)How does the machine's instruction set affect performance?

Objectives: Performance and Benchmarks • What do we mean by the performance of a computer and why are we concerned with it? • What's the best way to compare the performance of two machines? • What are benchmarks? How useful are they? • Performance can be used to: • Guide design decisions • Compare architectures/implementations/compilers • However, performance is in the eye of the beholder! • Response/Execution time - time between start and completion of a task Throughput - total amount of work done in a given time (number of job processes per unit time)

Computer Performance: TIME, TIME, TIME • Response Time (latency)— How long does it take for my job to run?— How long does it take to execute a job?— How long must I wait for the database query? • Throughput— How many jobs can the machine run at once?— What is the average execution rate?— How much work is getting done? • If we upgrade a machine with a new processor what do we increase? • If we add a new machine to the lab what do we increase?

Measuring Performance • Factors that affect performance: • How well the program uses the instructions of the machine • How well the underlying hardware implements the instructions • How well the memory and I/O systems perform • We will compare performance of different machines on the same task • Performance of machine X for a given program is defined as: Performance (X) = 1 / Execution Time(X) • If performance of X is better than Y: Execution Time (Y) > Execution Time (X) Performance (X) > Performance (Y) because: 1 / Execution Time(X) > 1 / Execution Time(Y) • Speedup of architecture X over YPerformance(X) / Performance(Y) = Execution Time(Y) / Execution Time(X) = n meaning: X is n times faster than Y

Examples • Example 1: Machine A does a task in 20s, machine B does the same task in 25s. • What is the performance of each machine? (PA = 1/20,PB = 1/25) • How much faster is A than B? (what is the speedup?) (5/4) • Is "performance" a meaningful metric? (NO: depends on task) • Example 2:Machine A executes a program in 10s. • If machine B is 1.3x faster than A, what is the execution time on machine B? (1.3 = PB/PA = TA/TB: TB = 10/1.3) • If machine C is 1.5x slower than A, what is the execution time on machine C? (1.5 = PA/PC = TC/TA: TC = 15) • But how do we measure time?

Measuring Computer Time • Unix time command output on a program provides: • Real time – time from invocation to termination • User CPU time - time CPU executes within this task • System CPU time - O/S tasks performed on behalf of this task • These measures (especially elapsed time) are what users perceive. Is this response time or throughput? • How do you measure portions of a program? How do you measure time on Windows?

Clock cycles, Clock Rate and Execution Time • Computers are constructed using a clock that runs at a constant rate and determines when events take place in hardware. These discrete time intervals are called: clock cycles/ticks /clock periods/cycles. • The length of a clock period is the time for a complete clock cycle (e.g., 2 nanoseconds, 2 ns). • Clock rate is the number of cycles per second, often expressed in megahertz (MHz). Clock rate is the inverse of clock period: 1/cycle time. • What is the clock rate for a 2 ns cycle? 1/(2×10-9) = 500×106 = 500 MHz • What is the clock period for a machine with a clock rate of 800 MHz? • What is the clock period for a machine with a clock rate of 400 MHz? (Answer: 1/(800×106) = 1.25×10-9 sec; 1/(400×106) = 2.5×10-9 sec) • Relationship: faster clock rate, lower clock period.

time Clock cycles, Clock Rate and Execution Time • Instead of reporting execution time in seconds, we often use cycles • Clock “ticks” indicate when to start activities (one abstraction): • cycle time (clock period) = time between ticks = seconds per cycle • clock rate (frequency) = cycles per second (1 Hz. = 1 cycle/sec) • A 200 MHz clock ticks • A 200 MHz. clock has cycle time:

Clock cycles, Clock Rate and Execution Time • How do we calculate execution time? • Factors: • How many cycles to do all the work? • How long each cycle takes (Clock Period)? • Calculation of Time using Clock Period (cycle period, cycle length) CPU Exec Time =# clock cycles × clock period [Units] seconds = cycle × seconds/cycle • Example: Assume a program requires 200 × 106 cycles on a machine where each cycle takes 2 ns. What is the execution time? (200 × 106 × 2 × 10-9 = 0.4 sec) • Calculation of Time using Clock Rate (cycle frequency, clock frequency) Clock period = 1/Clock Rate Therefore: Execution Time = # clock cycles/clock rate [Units] seconds = cycles / (cycles/second) • Example: Assume a program requires 200 × 106 cycles on a machine with clock rate of 500 MHz. What is the execution time? (200 × 106/(500 × 106) = 0.4 sec)

Examples • Example 1: Machine A runs at 500 MHz. Machine B runs at 650 MHz. Program1 requires 100 x 106 clock cycles on machine A and 1.2 times that many on machine B. Which machine is faster? By how much? Exec(A) = 100 × 106 / (500 × 106) = .2 seconds OR 100 × 106× 2 × 10-9 = 200 × 10-3 = .2 s Exec(B) = 120 × 106 / (650 × 106) = .18 seconds Machine B is .2/.18 = 1.11 times faster than A Compare: 650/500 = 1.3 times clock rate • Example 2: If a program takes 10 seconds on a 500 MHz machine. • How many cycles must it require? Cycles = 10 seconds × 500 × 106 cycles/second = 5000 × 106 cycles • What clock rate would be needed to achieve a 1.2 times speedup? (assuming clock cycles can stay the same) Target Execution: 10/1.2 = 8.3 sec 5000 × 106 / 8.33 = 602 MHz

1st instruction 2nd instruction 3rd instruction ... 4th 5th 6th How many cycles are required for a program? • Could assume that # of cycles = # of instructions • This assumption is incorrect: Different instructions take different amounts of time on different machines.Why? hint: remember that these are machine instructions, not lines of C code time

Different numbers of cycles for different instructions • Multiplication takes more time than addition • Floating point operations take longer than integer ones • Accessing memory takes more time than accessing registers • Important point: changing the cycle time often changes the number of cycles required for various instructions (more later) time

Cycles per Instruction, (CPI) • The number of Cycles per Instruction, CPI helps software designers avoid Instructions with a high CPI in favor of those with a low CPI. • Program CPI = Average number of clock cycles per instruction. • CPI depends on hardware implementation and instruction mix. We may calculate based on instruction counts OR based on relative instruction frequencies. • Example 1: Assume 3 types of instructions: • Arithmetic (=,+,-,*,/) takes 4 cycles • Conditional (if) takes 3 cycles • I/O takes 5 cycles Consider the following code segment: cin >> num1; cin >> num2; num3 = num1 + num2; if (num3 > 10) cout << "yes"; else cout << "no"; a) How many cycles to complete? (5+5+8+3+5=26 cycles)b) What's the average number of cycles per instruction?(26/4 = 5.2 cycles)

Program Cycles per Instruction, (CPI) • CPI Calculation with Instruction Count:Assume CPI = CPU Clock Cycles/Instruction Count then overall program CPU Clock Cycles = Σ(CPIi× Counti)so that CPI = Overall Program Cycles/#Instructions • Example 2: Assume Class A CPI=1, Class B CPI=2, Class C CPI=3 Program requires 5 A, 3 B, 2 C instructions. What is the CPI? # CPU Cycles = 5 × 1 + 3 × 2 + 2 × 3 = 17 # Instructions = 5 + 3 + 2 = 10 ThereforeCPI = 17 cycles/10 instructions = 1.7 cycles/instruction • CPI Calculation with Relative Frequencies:Let fi be the relative frequency of instruction set i with CPIi cycles per instruction. Then Program CPI = Σ(CPIi× fi) • Example 3: Assume Class A CPI=1, Class B CPI=2, Class C CPI=3 and Program uses 50% A, 30% B, 20% C instructions. What is the CPI? CPI = .5 × 1 + .3 × 2 + .2 × 3 = 1.7

Program Cycles per Instruction, (CPI) • Why is: CPI = Σ(CPIi× fi) true? CPI = CPU Clock Cycles/Instr. Count = Σ(CPIi× Counti)/Instr. Count = Σ(CPIi× Counti/Instr. Count) = Σ(CPIi× fi). Execution Time Execution Time = #Cycles × cycle time = (CPI × Instr. Count) × cycle time = Instruction Count × CPI × cycle time = (Instruction Count × CPI)/Clock Rate Example 1: How long would it take to execute a program with 100 × 106 instructions if CPI is 3 and clock rate is 500 MHz? (Answer: Time = 100 × 106× 3/(500 × 106) = 3/5 = 0.6 sec)

Improving Computer Performance • Time = Instruction Count × CPI × cycle time Time = (Instructions / Program)×(# Cycles / Instruction)×(Seconds / Cycle) • For a given instruction set architecture, increases in CPU performance come from three sources: • Increases in clock rate • Improvements in processor organization that lower the CPI • Compiler enhancements that lower instruction count or generate lower average CPI • Which source was used to improve performance by: • Using Intel Pentium III 933 MHz instead of Intel Pentium III 800 MHz. • Using Intel Pentium IV instead of Intel Pentium III. • Using release versions instead of debug versions of programs. • Very important: When comparing two machines, you must consider all three components of execution time. If some factors are identical, then comparison can be based on just non-identical factors.

Improving Computer Performance: RISC vs. CISC • Time = (Instructions / Program)×(# Cycles / Instruction)×(Seconds / Cycle) • Computer Architectures can be categorized s RISC or CISC (Reduced Instruction Set Computer vs. Complex Instruction Set Computer). • The CISC approach attempts to minimize the number of instructions per program, sacrificing the number of cycles per instruction. • Emphasizes improving hardware • Includes multi-clock complex instructions • RISC does the opposite, reducing the cycles per instruction at the cost of the number of instructions per program. • Emphasis on software • Includes single-clock reduced instruction only • Modern architectures emphasizes RISC

Improving Computer Performance • Example 2: Machine 1and Machine 2 both have clock speeds of 500 MHz On Machine 1, program P requires 100 × 106 instructions & has a CPI of 2.5 On Machine 2, program P requires 90 × 106 instructions & has a CPI of 3 Which machine is faster? By how much?(T1 = 0.5 sec, T2 = 0.54 sec, Machine 1 is 1.08 times faster) Evaluating Computer Performance: • A company that uses the same set of programs day in, day out uses the same programs (workload) to compare systems (e.g. old vs. new) • What if a company does not fall in these categories?Use some kind of rating.

Evaluating Computer Performance: Goal: simple metric where higher rating means better performance. Some ratings are: • Native MIPS • Peak MIPS • Relative MIPS • MOPS, MFLOPS For all these measures, there is a tendency to generalize, which is not valid. • Benchmarks: Programs specifically chosen to measure performance. Organization in charge of Benchmarks is: System Performance Evaluation Cooperative (SPEC). The rating is the SPEC ratio with respect to some standard machine. • The higher the SPEC ratio, the better the machine.

SPEC ’89 for IBM Powerstation 550 • Compiler “enhancements” and performance

Summary • Performance of a computer can be measured by: Response/Execution time - time between start and completion of a task and Throughput - total amount of work done in a given time. • Factors determining execution time are: Number of cycles to do all the work and how long each cycle takes (Clock Period). • CPI helps software designers avoid Instructions with a high CPI in favor of those with a low CPI where possible. • Program CPI can be obtained from Instruction Count or from the instruction relative frequencies. • Improving Performance means decreasingTime = Instruction Count × CPI × cycle time = (Instr. / Program)×(# Cycles / Inst.)×(Seconds / Cycle) by • Increases in clock rate • Improvements in processor organization that lower the CPI • Compiler enhancements that lower instruction count or generate lower average CPI • Ratings of Computer Performances are: MIPS, MOPS, MFLPOS and by using Benchmarks.

Performance Formulas

CHAPTER 2

CHAPTER 2

Presentation Transcript

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2

Chapter 2:

Chapter 2

chapter 2

chapter 2

Chapter 2-2

CHAPTER 2

Chapter 2

Chapter 2

CHAPTER 2

Chapter 2

Chapter 2

CHAPTER 2

Chapter 2