300 likes | 390 Views
Reading: 2.4, 3.1-3.5. Measuring and Discussing Computer System Performance. or “My computer is faster than your computer”.
E N D
Reading: 2.4, 3.1-3.5. Measuring and Discussing Computer System Performance or “My computer is faster than your computer” Peer Instruction Lecture Materials for Computer ArchitecturebyDr. Leo Porteris licensed under aCreative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
Match (Best) Performance Metric to Domain Prep with explanation of metrics and domains. Performance Metrics 1. Network Bandwidth (data/sec) 2. Network Latency (ms) 3. Frame Rate (frames/sec) 4. Throughput (ops/sec) Domains Jack’s car buying analogy, we care about many….
Execution Time Frame Rate Throughput (operations/time) Responsiveness Performance / Cost Performance / Power Performance / Power^2 Measures of “Performance”
Recall our O(n) discussion • Much of computer science focuses on execution time – and much of our class will as well. • Ultimately, much of what we do we want fast (response time). • So is time a reasonable metric? People sit in front of computers / iphones / etc. User time/ clock time/ CPU time CPU Time for this class – for now.
All Together Now CPU Execution Time Instruction Count CPI Clock Cycle Time = X X
All Together Now seconds CPU Execution Time Instruction Count CPI Clock Cycle Time = X X instructions seconds/cycle cycles/instruction
CPU Execution Time Instruction Count CPI Clock Cycle Time = X X • IC = 1 billion, 500 MHz processor, execution time of 3 seconds. What is the CPI for this program? 3 sec = 1*10^9 inst*CPI * 1sec/(5*10^8)cycles 1.5*10^9 cycles = 10^9insts*CPI 1.5 = CPI
Individual only Who Affects Performance? CT IC CPU Execution Time Instruction Count CPI Clock Cycle Time = X X • There are a number of people involved in processor / programming design • Each of these elements of the performance equation can be impacted by different designer(s) • Next slides will be about who can impact what. We’ll do speed voting (1 min ind, 1 min group) then discuss each slide.
1 min ind / 1 min group Who Affects Performance? CT IC CPU Execution Time Instruction Count CPI Clock Cycle Time = X X • What can a programmer influence?
Who Affects Performance? 1 min ind / 1 min group CT IC CPU Execution Time Instruction Count CPI Clock Cycle Time = X X • What can a compiler influence?
Who Affects Performance? 1 min ind / 1 min group CT IC CPU Execution Time Instruction Count CPI Clock Cycle Time = X X • What can an instruction set architect influence?
Who Affects Performance? 1 min ind / 1 min group CT IC CPU Execution Time Instruction Count CPI Clock Cycle Time = X X • What can an hardware designer influence?
Performance Variation CPU Execution Time Instruction Count CPI Clock Cycle Time = X X ROW DIFF 1 Same Diff Same DIFF Same Same Diff 2 DIFF 3 Same Diff Diff
Other Performance Metrics • Time is useful – but how might we try to measure the “performance” of a machine • MIPS • MFLOPS
MIPS MIPS = Millions of Instructions Per Second = Instruction Count Execution Time * 106 = Clock rate CPI * 106 • program-independent • deceptive Just crank up clock rate and have it execute tons of noops. But we need to sell processors, what do we market?
Trying to market the “performance” of a processor. • “Speed Demons” vs. “Brainiacs” Intel vs. Alpha, Intel wins… but ends up remarketing. If we can’t use something like raw processor speed (CT), if we want CPI - we need to look at performance on benchmarks
Benchmarks - Which Programs? • peak throughput measures (simple programs)?
Benchmarks - Which Programs? • peak throughput measures (simple programs)? • synthetic benchmarks (whetstone, dhrystone,...)?
Benchmarks - Which Programs? • peak throughput measures (simple programs)? • synthetic benchmarks (whetstone, dhrystone,...)? • Real applications
Benchmarks - Which Programs? • peak throughput measures (simple programs)? • synthetic benchmarks (whetstone, dhrystone,...)? • Real applications • SPEC (best of both worlds, but with problems of their own) • System Performance Evaluation Cooperative • Provides a common set of real applications along with strict guidelines for how to run them. • provides a relatively unbiased means to compare machines.
Danger in Benchmark-Specific Performance Measures • measures compiler as much as architecture!
SPEC Performance on Pentium III and Pentium 4 Focus on clock rate relative to change in INT vs. FP performance. SSE2 on P3 was a FP stack, P4 had independent registers
Speedup • Often want to compare performance of one machine against another Performance = 1 Execution Time Speedup (A over B) = PerformanceA PerformanceB Speedup (A over B) = ETB ETA
Amdahl’s Law Execution Time Affected Execution time after improvement = + Execution Time Unaffected Amount of Improvement
Amdahl’s Law and Parallelism Execution Time Affected Execution time after improvement = + Execution Time Unaffected Amount of Improvement • Our program is 90% parallelizable (segment of code executable in parallel on multiple cores) and runs in 100 seconds with a single core. What is the execution time if you use 4 cores (assume no overhead for parallelization)? ISOMORPHIC
Amdahl’s Law and Parallelism Execution Time Affected Execution time after improvement = + Execution Time Unaffected Amount of Improvement • Our program is 90% parallelizable (segment of code executable in parallel on multiple cores) and runs in 100 seconds with a single core. What is the execution time if you use 2 cores (assume no overhead for parallelization)? ISOMORPHIC
Amdahl’s Law • So what does Amdalh’s Law mean at a high level?
Point out Phenom II x4 and x2 get same performance – but one has 4 cores the other 2. What does this tell us? (Note – with some slower speeds of the Phenom this isn’t the case – 4 cores help.)
Speedup vs. Sizeup • Speedup runs into problems for parallelization because of diminishing returns and dominance of serial execution. • What if time were a constant? Human perception (graphics) Earthquake prediction Weather prediction
Key Points • Be careful how you specify “performance” • Execution time = IC * CPI * CT • Use real applications, if possible • Use standards, if possible • Make the common case fast