490 likes | 648 Views
Lecture 1: Performance. EEN 312: Processors: Hardware, Software, and Interfacing. Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier (UM). PERFORMANCE TRENDS. Growth in Processor Performance since 1978. Logarithmic Scale!. Moore ’ s Law.
E N D
Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier (UM)
Growth in Processor Performance since 1978. Logarithmic Scale!
Moore’s Law • Gordon Moore – One of the founders of Intel • Famously predicted in 1960 that the transistor capacity of integrated circuits would double every 18-24 months. • Not really a law, but has largely held true. • Generally translates into increased performance, and decreased cost.
How do we get to Performance? • Does more transistors really mean more performance? • Is it a one-to-one correlation? • How might transistors NOT correlate to increased performance?
A simple example • Say we have two computers. You know one is rated at 1GHz and another is rated at 800MHz. • Which computer has a higher performance?
A simple example? • What do GHz and MHz even mean? • What else could differ about the machines? • What else could differ about the context of performance?
First, Some Measure Theory • What is a measure? Formally? • A way of assigning numbers to the subsets of some set, which can be said (intuitively) to be the size of the set. • Measures require measurable spaces, and measurable sets. • Not all sets are measurable!
Measurable Sets/Spaces • One reason a space or set may be unmeasurable is if it is ill-defined.
Defining Performance • We can define performance in several ways. • Response time • How long does it take to accomplish a task? • We send input to a black box, and measure how long it takes to get out output.
Defining Performance • We can define performance in several ways. • Throughput • How much work gets done during a certain amount of time? • Watch a system, count the number of jobs finished during a certain amount of time.
Throughput Example • What is the fastest way you can think to deliver a large amount of data? • Never underestimate the throughput of a Mack Truck loaded with hard drives! TRUCK YEAH!
What’s the Response time of our Truck? TRUCK YEAH???
Response time as Execution Time • Start a program, wait for it to return results.
Comparing Performance • Given the performance or execution time of a computer (A) and a different computer (B) running the same program, we can compare performance.
Comparing Performance • Relative performance
So How Do We Measure Performance • First let’s define performance: • Execution time • What is our measurable space? • What is our measurable set?
Measuring Execution Time • CPU execution time • Wall clock time • How might these differ?
Measuring Execution Time • Clock cycles • Instruction count
Clock Cycles • Clock period – duration of a clock cycle • Clock frequency – number of cycles per second Clock period Clock (cycles) Data transferand computation Update state
CPU Time • We can improve performance by • Reducing the number of clock cycles • Increasing clock rate • Often there is a trade-off
CPU Example • Computer A: 2 GHz clock, 10s CPU time • Computer B • Aim for 6s CPU time. If you increase clock speed, the number of cycles increase by 1.2x. Break Into Groups Find the necessary clock rate for Computer B
CPU Example • Computer A: 2 GHz clock, 10s CPU time • Computer B • Aim for 6s CPU time. If you increase clock speed, the number of cycles increase by 1.2x.
Instruction Count and CPI • Instruction count • How many instructions the program has • Depends on the ISA and compiler • CPI • Cycles per instruction • Determined by hardware
CPI Example • Computer A: Cycle Time = 250ps, CPI = 2.0 • Computer B: Cycle Time = 500ps, CPI = 1.2 • Same ISA • Which is faster? By how much? Break Into Groups
CPI Example • Computer A: Cycle Time = 250ps, CPI = 2.0 • Computer B: Cycle Time = 500ps, CPI = 1.2 • Same ISA • Which is faster? By how much? A is faster… …by this much
CPI Detail • Sometimes different instructions take differing amounts of time. • Often we will want to weight by instruction proportion in a program. Relative frequency
CPI Example • Have instruction classes A, B, and C. Two was to compile our code: Give the average CPI for each program
CPI Example • Sequence 1: IC = 5 • Clock Cycles= 2×1 + 1×2 + 2×3= 10 • Avg. CPI = 10/5 = 2.0 • Sequence 2: IC = 6 • Clock Cycles= 4×1 + 1×2 + 1×3= 9 • Avg. CPI = 9/6 = 1.5
Performance Summary • Performance depends on • Algorithm: affects IC, possibly CPI • Programming language: affects IC, CPI • Compiler: affects IC, CPI • Instruction set architecture: affects IC, CPI, Tc
The Power Wall • In CMOS IC technology ×30 5V → 1V ×1000
The Power Wall • Suppose a new CPU has • 85% of capacitive load of old CPU • 15% voltage and 15% frequency reduction • The power wall • We can’t reduce voltage further • We can’t remove more heat • How else can we improve performance?
Multiprocessors • Multicore microprocessors • More than one processor per chip • Requires explicitly parallel programming • Compare with instruction level parallelism • Hardware executes multiple instructions at once • Hidden from the programmer • Hard to do • Programming for performance • Load balancing • Optimizing communication and synchronization
Amdahl’s Law • Improving an aspect of a computer and expecting a proportional improvement in overall performance • Example: multiply accounts for 80s/100s • How much improvement in multiply performance to get 5× overall? • Break into Groups!
Amdahl’s Law • Improving an aspect of a computer and expecting a proportional improvement in overall performance • Example: multiply accounts for 80s/100s • How much improvement in multiply performance to get 5× overall? • Can’t be done! • Corollary: make the common case fast
Consider the following processors, P1, P2, and P3 executing the same instruction set with clock rates and CPI as indicated • Which processor has the highest performance in terms of instructions per second? • If the processors each execute a program in 10s, find the number of cycles and the number of instructions • We are trying to reduce the execution time by 30% but this leads to an increase in CPI of 20%. What clock rate should we have to get this reduction?
Consider a computer running code with four main routines, A, B, C, and D. • How much is the total time reduced if the time for Routine A is reduced by 20%? • How much is the time for Routine B reduced if the total time is reduced by 20%? • Can the total time be reduced by 20% by only reducing the time for Routine D?
Consider a computer running code with four main routines, A, B, C, and D. • How much is the total time reduced if the time for Routine A is reduced by 20%? • How much is the time for Routine B reduced if the total time is reduced by 20%? • Can the total time be reduced by 20% by only reducing the time for Routine D?
Consider a computer running code with four main routines, A, B, C, and D. • How much must we improve the CPI of Routine A if we want the program to run twice as fast? • How much must we improve the CPI of Routine C if we want the program to run twice as fast? • How much is the execution time improved if the CPI of routines A and B are reduced by 40%, and the CPI of routines C and D are reduced by 30%?
For next time • Read Chapter 2, Sections 2.1 – 2.3 • Finish Lab 0 by next lab session.