510 likes | 528 Views
Performance Models And Evaluation. Textbook and References. Textbook(s): Computer Architecture and Implementation, Harvey G. Cragon, Cambridge University Press, 2000 (ISBN: 0-52-165168-9)
E N D
Performance Models And Evaluation Dr. Anilkumar K.G
Textbook and References Textbook(s): • Computer Architecture and Implementation, Harvey G. Cragon, Cambridge University Press, 2000 (ISBN: 0-52-165168-9) • Digital Design and Computer Architecture, 2nd Edition, David M.H and Sarah L.H, Morgan Kaufmann, Elsevier, 2013 (ISBN: 978-0-12-394424-5) Reference(s): • Computer Architecture and Implementation, Harvey G. Cragon, Cambridge University Press, 2000 (ISBN: 0-52-165168-9) • Computer Architecture: A Quantitative Approach 4th Edition, David A. Patterson and John L. Hennessy, Morgan Kaufmann Publishers, 2005. (ISBN: 0-12-370490-1) Simulator: Tasm (Turbo Assembler) Dr. Anilkumar K.G
Objective(s) • The objective of this section is to make each student to familiar with performance evaluation of computer systems by using various analytical methods. Dr. Anilkumar K.G
Introduction • There are two common techniques to evaluate the performance of computers and the effect of design changes to computer or its subsystems. • Simulation: Simulation has a disadvantage of hiding the effects of workload or architectural parameters that are the input to the simulator. • Analytic models: Analytic models must explicitly comprehend each of the workload and architectural parameters. • Some parameters of an analytical model obtained by simulation of a small portion of a system. Dr. Anilkumar K.G
Performance Models • For a computer/computer subsystem, what measures of performance can be used by a designer: • There are two measurements used are, • The time to do tasks • Rate at which the given tasks are performed • That is, we are generally interested in: the time to do tasks and the rate at which given tasks are performed: • Task A executes in 3 microsecond per task, • Task A executes at a rate of 3.33x105 executions per second. • Time and rate have a reciprocal relationship. Dr. Anilkumar K.G
Performance Models (cont.) • Because the computer has a clock that controls all of its functions, the number of clocks is frequently used as a measure of time: • If the clock period of the processor is 50 ns • 300 clocks per execution of Task A, That is, Task A executes at a rate of 0.0033 tasks per clock. Dr. Anilkumar K.G
CPI (Clock Per Instruction) • A basic time model that is widely used in evaluating processors is clocks per instruction (CPI): CPI =number of CPU clocks/number of instructions executed • The reciprocal of CPI is instructions per clock (IPC): • IPC is a measure of processing rate and is a useful measure of performance in some situations. Dr. Anilkumar K.G
CPI (Cont.) • A task that take 1x106 clocks to execute 5x105 instructions has a CPI of 2 • Small values of CPI indicate higher performance. • For many processors, different instructions require a different number of clocks for execution; • thus CPI is an average value for all instructions executed. Dr. Anilkumar K.G
CPU Performance Equations • CPU time (exe time) = CPU clock cycles *clock cycle time (1) • CPU time = CPU clock cycles / clock rate (2) • CPI = no. of CPU clock cycles / instruction count (IC) (3) IC:- no. of instructions executed by the CPU From (3), • CPU clock cycles = CPI * IC (4) From (1) and (4) we get: • CPU time = CPI * IC * Clock cycle time (5) • CPU time = CPI * IC / Clock rate (6) • MIPS rate = Clock frequency /(CPI * 106) (7) Dr. Anilkumar K.G
Example: Calculating CPI & CPU time • For a 800MHz multi cycle MIPS (Microprocessor without Interlocked Pipeline Stages) system, there are 5 types of instructions: Load (5 cycles), Store (4 cycles), R-type (4 cycles), Branch (3 cycles), Jump (3 cycles) • If a program has: 50% R-type instructions, 15% Load instructions, 25% Store instructions, 8% Branch instructions, and 2% Jump instructions, then what is the CPI? ANS: CPI = (4*50+5*15+4*25+3*8+3*2)/100 = 4.05 IC = 50 + 15 + 25 + 8 + 2 = 100 • What should be its CPU time (execution time)? Dr. Anilkumar K.G
Example: Calculating CPI, CPU time, and MIPS • 3GHz processor was used to execute a benchmark program with the following instruction mix and clock cycle count: Determine the effective CPI, MIPS rate and Execution time (CPU time) of this program. Dr. Anilkumar K.G
Speedup • We use speedup to evaluate the effect of modifying a design. (Is the modified design B, better than design A?) • Speedup = Time A/Time B = CPI A/CPI B = IPC B/IPC A = Rate B/ Rate A • Consider speedup in terms of rate (Rate B/Rate A): • If design B is an improvement, then speedup will be greater than 1. • If design B hurts performance, the speedup will be less than 1. • If speedup is equal to 1, there is no performance change. Dr. Anilkumar K.G
Percent Difference and Change • The percent difference in the performance between two systems, A and B. • Assume that (Rate A>Rate B): Example: Processor A performs a task at the rate of 3 tasks per second while B performs the same task at 2 tasks per second. By what percentage is A faster than B? Dr. Anilkumar K.G
Percent Difference and Change • The percent difference in the performance between two systems, A and B. • Assume that (Time A < Time B): Example: Processor A performs a given task in 3 s while processor B Requires 4 s for the same task. By what percentage is A faster than B? Dr. Anilkumar K.G
Means and Weighted Means • We need to measure the performance of a computer with a single number (from many factors) that can be compared with the comparable number of another computer or used to evaluate a design modification. • We can make three types of measurements: 1) The time needed to perform a task. 2) The rate at which a task is performed. 3) The ratio of times or rates needs to find speedup. Dr. Anilkumar K.G
Means and Weighted Means (Cont.) • There are three types of means used to find the central tendency of the measurements: • Arithmetic mean for times • Harmonic mean for rates • Geometric mean for ratios • These three means may have equal weights or be weighted. Dr. Anilkumar K.G
Means and Weighted Means (Cont.) • Weight will be the fractional occurrence of an event, in which the fraction is x/100, or to the frequency of an event, in which the frequency is also given as x/100. • In general weight, fraction and frequency are synonymous and the sum of weights, fractions or frequencies is equal to one. Dr. Anilkumar K.G
Time-Based Means • The time required to perform a specific task. • Is a fundamental measurements in the field of performance modeling. • When other measures are used, the validity of these measures can usually be checked by converting to time (such as latency time, access time, response time, etc). • Two time based central tendency measurements: • Arithmetic mean • Weighted arithmetic mean Dr. Anilkumar K.G
Time-Based Means (Cont.) The arithmetic mean of the time per event is determined by: • Where Ti is the execution time for the ith program • of a total of n workload. • Wi is the weight of i operations,W1+ W2+…+ Wn=1 Dr. Anilkumar K.G
Rate-Based Means • Rate-based means are important when the performance is measured as rates. Where Ti = 1/Ri (Reciprocal of arithmetic mean) (Reciprocal of weighted arithmetic mean) Dr. Anilkumar K.G
Rate-Based Means (Cont.) • Ex. Consider the example in the subsection on arithmetic means of jobs being run on the corporate computer. We express the observations in a rate measure of jobs per hour. These data are 0.5, 0.45, 0.53 and 0.43 jobs per hour. What is the central tendency of these measurements in jobs per hour? Dr. Anilkumar K.G
Rate-Based Means (Cont.) Dr. Anilkumar K.G
Ratio Based Means • Some data observations are ratios of either times or rates. The geometric mean is the central tendency of ratios: In weighted geometric mean, the observed ratios are weighted by the fraction (wi) of use and is used to find the central tendency. Dr. Anilkumar K.G
Ratio Based Means (Cont.) • Ex. Two computers execute four loops of scientific program in the number of clocks shown below. What is the central tendency of the speedup for the loops? Table 2.4 Loop execution time (clocks) Dr. Anilkumar K.G
Ratio Based Means (Cont.) • Geometric mean with equal weighted. Dr. Anilkumar K.G
Weighted geometric mean example • From the previous example (Table 2.4), assume that for a particular benchmark, loop1 is executed 20 times, loop2 is executed 30 times, loop3 is executed 50 times, and loop4 is executed 100 times. What is the mean speedup of the four loops? Ans: Total time by loops = 20+30+50+100 = 200 The weight for loop1 = 20/200 =0.1, for loop is 0.15, loop 3 is 0.25, and loop 4 is 0.5. Apply weighted geometric mean for speedup of 4 loops: = 1.950.1*1.960.15*2.080.25*2.380.5 =2.19 Dr. Anilkumar K.G
Means Summary Dr. Anilkumar K.G
Compound Growth Rate • If each year we can have a processor that is twice as fast as last year’s processor, the compound growth rate of performance is 100% per year. • A doubling in performance every 2 years results from a compound growth rate of 41% per year. • Compound growth rate is identical to compound interest earned on money. Each year the interest rate is computed on the principle plus the prior year’s interest. Dr. Anilkumar K.G
Compound Growth Rate (Cont.) • The formula for compound growth rate is Dr. Anilkumar K.G
Compound Growth Rate (Cont.) • Ex. A transaction server performed at the rate of 100 transactions per second. The server is upgraded over the years and performs at the rate of 700 transactions per second. If the transactions are identical. What is the compound growth rate of the transactions per second over this 5-year period? Dr. Anilkumar K.G
Compound Growth Rate (Cont.) • The compound growth rate is Speedup = last value/starting value = 700/100 = 7 Compound growth rate = 100x(no.of time√speedup – 1) = Dr. Anilkumar K.G
Amdahl’s Law • Amdahl’s law models the speedup of a computer when it is processing two classes of tasks; one class can be speededup whereas the other class cannot (Amdahl 1967). • This situation is frequently encountered in computer system design. As per Amdahl’s law, speedup = performance with enhancement performance without enhancement Dr. Anilkumar K.G
Amdahl’s Law(cont) Speedup = Execution time without enhancement Execution time with enhancement Speedupoverall = Execution timeold = 1 Execution timenew 1 – fractionenhanced + fractionenhanced Speedupenhanced Dr. Anilkumar K.G
Amdahl’s Law (Cont.) • Assume that a program has two components, t1 and t2. The component t2 can speeded up. The fraction of time that can be speeded by a factor n. Then the overall speedup of the system is: Dr. Anilkumar K.G
Amdahl’s Law (Cont.) • For some problem we may not know t1 and t2. However, we do know the fraction of time that can or can not be speeded up. We define a to be the fraction of time that can not be speeded up and 1-a as the fraction of time that can be speeded up by a factor n. Then the speedup can be calculated as: Dr. Anilkumar K.G
Amdahl’s Law (Cont.) (1) Dr. Anilkumar K.G
Amdahl’s Law (Cont.) • Incase, a is the fraction of time that can be speeded up, then the Amdahl’s law is: substitute a for (1 – a) and (1 – a) for a into the equation (1): speedup = 1 /((1 – a) + a / n) (2) Where a is fraction of time enhanced for the speedup. Dr. Anilkumar K.G
Amdahl’s Law (Cont.) • Ex1. An executing program is timed, and it is found that the serial portion (can not be speeded up) consumes 30s whereas the other portion that can be speeded up consumes 70s of the time. You believe that by using parallel processors, you can speed up this later portion by a factor of 8. What is the speedup of the system? • Ex2. A new CPU is 10 times faster on computation than the original CPU. Assume that the original CPU is busy with computation 40% of the time and is waiting for I/O 60% of the time. What is the overall speedup gained by incorporating the enhancement? Dr. Anilkumar K.G
Moore’s Law • (1965) “The number of circuits per die had doubled every year” • “I expect a change in the slope to occur at about the present time. Form a doubling of the slope of the curve annually for the first 15 years or so, the slope drops to about one-half its previous value to a doubling once every two years.” Moore said. Dr. Anilkumar K.G
Integrated Circuit (IC) Dr. Anilkumar K.G
Moore’s Law (Cont.) • Moore’s law is a useful tool for the designer to ensure that a new design will not be “behind the curve” when it is completed. • Why Moore’s law? • Increased area, decrease geometry more device per die. • Die/wafer = (wafer radius)2 - (wafer diameter) Die area √2x Die area • Moore’s law is now observed to hold true in hard disks as well as semiconductors. Dr. Anilkumar K.G
Moore’s Law (Cont.) Dr. Anilkumar K.G
Moore’s Law (Cont.) Example: Find the number of dies per 20cm wafer for a die with 1.5cm side. Ans: Die area = 1.5cm * 1.5cm = 2.25cm2 Die per wafer= ((3.14*102)/2.25) – 3.14*20/2*2.25 = 110 Dr. Anilkumar K.G
Grosch’s Law • “… giving added economy only as the square root of the increase in speedup – that is, to do a calculation ten times as cheaply you must do it one hundred times as fast.” • C=KP1/2 or P = KC2; P is the computing power, K is a constant, and C is the system cost. • Twice the money one obtains four times the computing power. Dr. Anilkumar K.G
Steady-State Performance • One of the common measure is millions of instructions per second (MIPS). • Ex. A given processor executes 2-million instructions in 3s. What is the MIPS measure of performance? MIPS = No.instructions executes / CPU time x 106 = 2x106/3x106 = 0.66 MIPS. *CPU time = CPI * IC * Clock cycle time Dr. Anilkumar K.G
Steady-State Performance (Cont.) • The MIPS measure of processor performance is best used when comparing the performance of two processors with the same instruction set executing the same program. Dr. Anilkumar K.G
Steady-State Performance (Cont.) • The other measure is “time to execute task”. • task time (Execution time or CPU time): = no. of instructions executed x CPI x time per clock = no. of instructions executed x CPI / clock frequency Dr. Anilkumar K.G
Steady-State Performance (Cont.) • Ex. A processor executes 10-million floating-point and 1-million overhead instructions in 50 ms. What is the MFLOPS rating of this processor? MFLOPS = 10 x106 / 50x10-3 x 106 = 200 Dr. Anilkumar K.G
Transient Performance • Steady-state performance measures are relatively useless for many computer uses today. • We need transient performance or response time. • The Computer that has single user of a resource, executing only one task. • Response time = OS time + disk access time + data transfer time. • Network access computer will has more factor we will study it later. Dr. Anilkumar K.G
Exercises • Processor A performs a given task in 4s while processor B requires 8s for the same task. By what percentage is A faster than B? • Processor A performs a given task at rate of 4 tasks per second while processor B performs the same tasks at 2 tasks per second. By what percentage is A faster than B? • A processor executes 200M instructions per second. With an improved memory, the execution rate increases to 225M instructions per second. What is the percentage of change in performance? Dr. Anilkumar K.G