270 likes | 455 Views
Performance Evaluation of Architectures. Vittorio Zaccaria. Performance Evaluation. From the client perspective: response time (or latency): time to run the task. From the server perspective: Throughput (or bandwidth): tasks executed per second. Speedup. X is n% faster than Y if:
E N D
Performance Evaluationof Architectures Vittorio Zaccaria
Performance Evaluation • From the client perspective: • response time (or latency): time to run the task. • From the server perspective: • Throughput (or bandwidth): tasks executed per second.
Speedup • X is n% faster than Y if: ExTime(y) Speedup(x,y)= -------------- = 1+n/100 ExTime(x)
Performance and Speedup • Performance(A)=1/ExTime(A). • Speedup(x,y)= Performance(x)/Performance(y)
Excercise: • A executes a task in 10 secs. • B executes the same task in 15 secs • What is true? • A is 50% faster than B • A is 33% faster than B
Excercise (15 min) • Linpack and Dhrystone benchmarks on several VAX models:
Excercise: • Calculate: • In the Linpack case: • Total speedup and average per-year speedup from VAX8600 to VAX780 • The same for VAX8550 and VAX8600 • In the Dhrystone case: • Total speedup and average per-year speedup from VAX8600 to VAX780 • The same for VAX8550 and VAX8600
Excercise speedup Average per Year speedup
Amdahl’s Law ExTimenew = ExTimeold x (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced 1 ExTimeold ExTimenew Speedupoverall = = (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced If speedup-enhanced goes to infinity, speedup-oveall reaches 1/(1-fraction_enhanced)
Excercise on Amdhal’s Law • Floating point instructions improved to run 2X; but only 10% of actual instructions are FP Speedupoverall = ?
Excercise on Amdhal’s Law Solution: ExTimenew= ExTimeold x (0.9 + .1/2) = 0.95 x ExTimeold 1 Speedupoverall = = 1.053 0.95
2nd Excercise on Amdhal’s Law • Suppose to improve the CPU speed 5X (with a 5X cost) • Suppose that the CPU is used 50% of the time and that the base CPU cost is 1/3 of the entire system • It is worth to upgrade the CPU? Compare speedup and costs!
2nd Excercise on Amdhal’s Law • Speedup=1/(0.5+0.5/5)=1.67 • Increased= (2/3)+(1/3)*5=2.33 It is not worth to upgrade the CPU!
Performance Indexes • Response time = latency due to the completion of a taskincluding disk accesses, memory accesses, I/O Activity and other parallel tasks. • CPU time = does not include I/O wait time and corresponds to CPU user time and the CPU system time (OS)
CPU time • CPUtime(P)= Clock Cycles needed to exec P ------------------------------------- clock frequency
Average CPI The average Clock Cycles per Instruction (CPI) can be defined as: clock cycles needed to exec. P CPI(P)= --------------------------------------- number of instructions CPUtime= Tclock*CPI*Ninst = (CPI*Ninst)/f
CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Aspects of CPU performance
Aspects of CPU performance • The CPI can vary among instructions: • CPI_i is the number of clock cycles needed by instruction type i • IC_i is the number of times that instruction i is executed. n Σ CPU time = CycleTime CPI * IC * i i = 1 i
Overall CPI • The overall CPI can be expressed as (CPU clock cycles)/Instructions: n Σ CPI = CPI * ( I / instructions) i i = 1 i Invest Resources where time is Spent!
Excercise A RISC processor shows the following statistics: Base Machine (Reg / Reg) Op Freq Cycles ALU 50% 1 Load 20% 5 Store 10% 3 Branch 20% 2 • Calculate the average CPI and the speedup w.r.t.: • The same machine with an improved D$ (Load Cycles=2) • The same machine with a branch CPI=1 • The same machine with 2 ALUs working in parallel.
Solution • Average CPI: 0.5x1+0.2x5+0.1x3+0.2x2=2.2 • Use Amdhal’s law to compute overall speedup: • Cache improved Speedup: 1.13 • Branch improved Speedup: 1.11 • ALU improved Speedup: 1.33
Excercise • Procedure calls in architecture A are very expensive. • Suppose to introduce a new architecture B similar to A such that: • A has a clock 5% faster than B. • The fraction of loads/stores of A is 30%. • B executes 30% loads/stores less than A • Loads/stores require 1 clock cycle. • Compare CPU times of A and B.
Solution • Number of instr. of B NB = [1-(0.3x0.3)]*NA=0.9*NA • Clock Period of B: TB=TA*1.05 • CPUtimeA=1*NA*TA • CPUtimeB=0.9*NA*TA*1.05*1 =0.945*CPUtimeA
MIPS • MIPS= millions of instructions per second. number of instructions frequency of the clock ------------------------------------ = -------------------------------- execution time(in sec) * 10^6 CPI * 10^6
MIPS (cont.) • Problem: depends heavily on the ISA. Difficult to compare different ISAs • It depends on the program • It can be the inverse of the performance!! A complex instruction set can have a MIPS lower than a simple instruction set but can execute in less time programs.
Relative MIPS • Relative MIPS of an architecture A: TCPU_A ------------------ x MIPS_reference_arch TCPU_reference_arch • In the 80’s the reference architecture was the VAX_11/780