260 likes | 271 Views
Performance. Performance. What is performance? How to measure performance ? Performance metrics Performance evaluation Why does some hardware perform better than others for different programs? What factors in hardware are related to performance?
E N D
Performance • What is performance? • How to measure performance? • Performance metrics • Performance evaluation • Why does some hardware perform better than others for different programs? • What factors in hardwareare related to performance? • How does the machine's instruction set affect performance?
Airplane Passenger Capacity Range (miles) Speed (m.p.h) Passenger throughput (passenger x m.p.h) 228750 Boeing 777 375 4630 610 268700 Boeing 747 470 4150 610 393600 Airbus A 3xx 656 8400 600 178200 Concorde 132 4000 1350 79424 Douglas DC-8-50 146 8720 544 Airplane Analogy • Which of these airplanes has the best performance?
Computer Performance • Response time (latency) • How long does it take for my job to run? • How long does it take to execute a program? • How long must I wait for a database query? • Throughput • How many jobs can the machine run at once? • What is the average execution rate? • How much work is getting done? • If we upgrade the processor of a machine which metric do we improve? • If we add a new machine to a network which metric do we improve?
Which Time to Measure? • Elapsed Time (Wall clock time, response time) • Counts everything (disk and memory access, I/O, operating system overhead, work on other processes) • Useful but not always good for comparison purposes • CPU (execution) time • The time CPU spends computing for the user task • Does not include time spent waiting for I/O, running other programs • user CPU time CPU time spent within the program, • system CPU time CPU time spent in the operating system performing tasks on behalf of the program
CPU Time • Unix timecommand reflects this breakdown by returning the following when prompted: 90.7u 12.9s 2:39 65% Interpretation: • User CPU time is 90.7 s • System CPU time is 12.9s • Elapsed time is 159 s ( 90.7+12.9) • CPU time is 65% of total elapsed time
A Definition of Performance • For some program running on machine X PerformanceX = 1/Execution_timeX • The machine X is said to be “ntimes faster” than the machine Yif PerformanceX/PerformanceY = n Execution_timeY/Execution_timeX= n • Example:Machine A runs a program in 10 seconds and machine B runs the same program in 15 seconds, how much faster is A than B?
Metrics of Performance • “Time to execute a program” is the ultimate metric in determining the performance • However, it is convenient to inspect other metrics as well when we examine the details of a machine. • Computers use a clock that runs at a constant rate and determines when an event takes place in hardware. • These discrete time intervals are called clock cycles(or ticks, clock ticks, clock periods). • Clock rate (frequency) is the inverse of clock period.
time Start of events often the rising edge of the clock Clock Cycles • Clock “ticks” indicate when to start activities • Instead of reporting execution time in seconds, we often use cycles
Clock Cycle • cycle time (CT) = time between ticks = seconds per cycle • Cycle Count (CC): the number of clock cycles to execute a program • clock rate (frequency) = cycles per second (1 Hz = 1 cycle/sec) • A 200 MHz clock has a 1/(200·106) = ? nanosecond cycle time • A 4 GHz clock has a 1/(4· 109) = ? nanosecond cycle time
The CPI Metric • CPIClocks Per Instruction • Number of cycles spent on an instruction on average. • CC = IC CPI • Hard to compute. • It is useful when comparing the performances of two machines with the same ISA. (Why?) • Example:two machines with the same ISA. For a certain program we have • Machine A:CPI = 2.0 • Machine B:CPI = 1.2 • Which machine is faster? • What if machine A uses 250 ps and machine B 500 ps cycle time
Improving Performance So, to improve performance • Increase the clock frequency (i.e. decrease the clock period) • Reduce the number of the clock cycles per program (IC CPI)
Instruction Cycle ? • No ! • The number of cycles per instruction depends on the implementations of the instructions in hardware • The number differs for each processor (even with the same ISA)
The Reason • Operations take different number of cycles • Multiplication takes longer than addition • Floating point operations take longer than integer operations • The access time to a register is much shorter than access to the main memory.
Simple Formulae for CPU Time • CPU execution time = CPU clock cycles for a program Clock cycle time (CC CT) • CPU execution time = CPU clock cycles for a program/Clock rate • We can writeCPU clock cycles for a program =IC CPI • ThenCPU execution time = (IC CPI)/Clock rate
Example • Computer A of 800 MHz • It runs our favorite program in 15 s • Our goal • Design computer B with the same ISA • It will run the same program in 8 s. • We may use a new process technology (>Ghz) • can increase the clock rate; • however, it will also increase CPI by 1.25. • What clock rate should we aim to use?
Performance • Performance is determined by execution time (CPU time) • We have also other indicators • # of cycles to execute program • # of instructions in program (IC) • # of cycles per second • average # of cycles per instruction (CPI) • average # of instructions per second • Common pitfall: thinking one of the above is indicative of performance when it really isn’t.
Number of Instructions Example • A compiler designer has the following two alternatives to generate a certain piece of code with instructions A(1 cycle) , B (2 cycles), and C(3 cycles): • 2106 of A, 106 of B, and 2106 of C (IC = 5106) • 4106 of A, 106 of B, and 106 of C (IC = 6106) • Which code sequence is faster?
The MIPS Metric • Millions Instructions Per Second = MIPS = IC/(Execution_time 106) MIPS = IC/(CC cycle time 106) MIPS = (IC clock rate)/(IC CPI 106) MIPS = clock rate/(CPI 106) • A faster machine has a higher MIPS Execution_time = IC/(MIPS 106)
A MIPS Example • A computer with 500 MHz clock • Three different classes of instructions: • A (1 cycle), B (2 cycles), C (3 cycles) • Two compilers used to produce code for a large piece of software. • Compiler 1: • 5 billion A, 1 billion B, and 1 billion C instructions. • Compiler 2: • 10 billion A, 1 billion B, and 1 billion C instructions. • Which sequence will be faster according to MIPS? • Which sequence will be faster according to execution time?
CPI example • CPI • Machine A: CPI = 10/7 = 1.43 • Machine B: CPI = 15/12 = 1.25 • CPU time • CPU time = (IC CPI) / clock rate • CPI changes according to instruction mix and freq. • When multiplied with clock cycle time gives accurate execution time.
Problems of MIPS • MIPS specifies instruction execution rate • MIPS does not take into account the capabilities of the instructions • Thus, it is impossible to compare computers with different ISA using MIPS. • MIPS is not constant, even on a single machine, depends on the application. • As we saw in the previous example, MIPS can vary inversely with performance.
Overview • A given program will require • Some number of instructions • Some number of clock cycles • Some number of seconds • Vocabulary • Cycle time: (micro or nano) seconds per cycle • Clock rate (frequency): cycles per second • CPI: clock cycles per instruction • MIPS: millions of instruction per second • MFLOPS: millions of floating point operations per second
Performance • Performance is ultimately determined by EXECUTION TIME • Is any of the following metrics good to measure performance by itself? Why? • # of cycles to execute a program • # of instructions in a program • # of cycles per second • Average # of cycles per instruction • Average # number of instructions per second
Question • Assuming two machines have the same ISA, which of the following quantities are identical? • Clock rate • CPI • Execution time • # of instructions • MIPS
HW or SW component Affects what? How? Program Performance Algorithm IC, possibly CPI Programming Language IC, CPI Compiler IC, CPI ISA IC, clock rate, CPI