580 likes | 594 Views
Learn essential performance terminology in computing, such as response time, throughput, latency, CPU time, clock frequency, and more. Understand how these factors impact system performance and program speed. Dive into examples and benchmarks to grasp performance evaluation methods. Discover the relationship between clock speed, CPI, and instruction count.
E N D
Terminology • Response Time: Time to do a task • Throughput: Work done per second • Latency: Time required to start a process • Throughput vs Latency: https://what-if.xkcd.com/31/
Terminology • Elapsed time • Total response time, including all aspects • Processing, I/O, OS overhead, idle time • Determines system performance • CPU time • Time spent processing a given job • Discounts I/O time, other jobs’ shares • Can be split into “user” time and “system” time
Terminology • A program used to take 20 minutes to run. Now it takes 15. What is the speedup? 1.33 times or 33% or
CPU Time • Performance: Response Time - How fast you can run programs Performance =
CPU Time • Performance: Response Time - How fast you can run programs Performance =
CPU Clocking • CPU governed by clock Clock period Clock (cycles) Data transferand computation Update state Clock period: duration of a clock cycle e.g., 250ps = 0.25ns = 250×10–12s Clock frequency (rate): cycles per second e.g., 4.0GHz = 4000MHz = 4.0×109Hz
Clock • Different subsystems, different clocks Pentium II i7
GHz Myth • Different processors = different work/clock
Clocks vs Instructions • Different jobs take different amounts of time • Common for different instructions to take differing number of clocks:
CPI • CPI = Clocks Per Instruction • Inverse of instructions per cycle • Clocks = clock ticks = machine cycles • Instructions / cycle – maximize • CPI – minimize
CPI • Programs involve different mixes of types:
CPI • CPI: Weighted average clocks per instruction • Sequence 1: IC = 5 • Clock Cycles= 2×1 + 1×2 + 2×3= 10 • Avg. CPI = 10/5 = 2.0
CPI • CPI: Weighted average clocks per instruction • Sequence 2: IC = 6 • Clock Cycles= 4×1 + 1×2 + 1×3= 9 • Avg. CPI = 9/6 = 1.5
CPI • A program is: 40% 1 cycle data ops 20% 2 cycle data ops 25% 3 cycle loads 15% stores • What is CPI?
CPI • A program is: 40% 1 cycle data ops 20% 2 cycle data ops 25% 3 cycle loads 15% stores • What is CPI?= .4 x 1 + .2 x 2 + .25 x 3 + .15 x 1= .4 + .4 + .75 + .15= 1.6
CPI Myths • Compiler A builds program:2 million 1-cycle + 1 million 2-cycle= 3 million instruction over 4 million cycles • Compiler B builds program:1.5 million 1-cycle + 1.2 million 2-cycle= 2.7 million instructions over 3.9 million cycles Faster program
CPI Myths • Compiler A:= 3 million instruction & 4 million cycles3 / 4 = .75 instruction per cycle • Compiler B:= 2.7 million instructions & 3.9 million cycles2.7 / 3.9 = .69 instructions per cycle Faster program Worse IPS
Instruction Count • Static Instruction Count : Number of instructions in compiled program • Dynamic Instruction Count : Number of instructions executed while running
Instruction Count • Static Instruction Count : Number of instructions in compiled program • Dynamic Instruction Count : Number of instructions executed while running • Real run time • Loops, skipped instructions, etc…
Measurement • Reliable performance measurement must measure all three factors Performance =
Real World • SPEC CPU Benchmark
Example • Example 1: Calculate execution time ???
Example • Example 1: Calculate execution time ???
Example • Example 1: Calculate execution time ???
Example • Example 2: Calculate CPI ???
Example • Processor A runs at 3GHz. Program P runs on it with a dynamic instruction count of 1.2E8 with a CPI of 1.8. • Processor B runs at 3.4GHz. Program P runs on it with a dynamic instruction count of 1.5E8 with a CPI of 1.6. • How much faster is the program on machine A?
Example • Processor A runs at 3GHz. Program P runs on it with a dynamic instruction count of 1.2E8 with a CPI of 1.8. • Processor B runs at 3.4GHz. Program P runs on it with a dynamic instruction count of 1.5E8 with a CPI of 1.6. • How much faster is the program on machine A? Performance =
Example • Processor A runs at 3GHz. Program P runs on it with a dynamic instruction count of 1.2E8 with a CPI of 1.8. • Processor B runs at 3.4GHz. Program P runs on it with a dynamic instruction count of 1.5E8 with a CPI of 1.6. • How much faster is the program on machine A? Performance = Performance = =
Example • Processor A runs at 3GHz. Program P runs on it with a dynamic instruction count of 1.2E8 with a CPI of 1.8. • Processor B runs at 3.4GHz. Program P runs on it with a dynamic instruction count of 1.5E8 with a CPI of 1.6. • How much faster is the program on machine A? Performance = Performance = =
Example • Processor A runs at 3GHz. Program P runs on it with a dynamic instruction count of 1.2E8 with a CPI of 1.8. • Processor B runs at 3.4GHz. Program P runs on it with a dynamic instruction count of 1.5E8 with a CPI of 1.6. • How much faster is the program on machine A? Speedup = Speedup = 1.022 or 2.2%
Pitfall 1 • Clock speed, CPI and instruction count all interact
Clock Speedup Issue • Clock speed up not guaranteed to increase performance
Limiting Circuits • Increasing clock may outpace time required by some circuits • This clock is too fast for the memory access:
Limiting Circuits • Increasing clock may outpace time required by some circuits • Memory access would need two cycles:
Clock Speedup Issue • Situation 1 : 100 ms per clock • 300 ms total
Clock Speedup • Situation 2 : 60 ms per clock • 180ms total
Clock Speedup • Situation 3 : 50 ms cycles • 2 cycles for part 3 • 200 ms total
MIPS • MIPS = Millions of Instructions Per Second • Blends CPI and clockspeed
MIPS Issues Different architecture have different power instructions:x += y * z; //x = r1, y = r2, z = r3 Computer A Computer B MLA r1, r2, r3, r1 MUL r1, r2, r3 ADD r1, r1, r4
MIPS Issues • MIPS can't compare different architectures Performance =
The Complete System • Performance depends on much besides CPU:
System Issues • Many technologies have followed exponential pattern like Moore’s law:
System Issues • But not all:
Amdahl's Law • Describes overall speedup of a system when we speed up one part of a system f : fraction of time part is limiting factor k : speedup of that part 1 – f : fraction of time doing other stuff S : speed up Version 1
Amdahl's Law • Describes overall speedup of a system when we speed up one part of a system Version 2