440 likes | 636 Views
Lecture 9: Performance Analysis. Goals of Performance Analysis. Compare alternatives Determine the impact of a feature System tuning Identify relative performance Performance debugging Set expectations for the next generation of computer. Performance Metric.
E N D
Goals of Performance Analysis Compare alternatives Determine the impact of a feature System tuning Identify relative performance Performance debugging Set expectations for the next generation of computer
Performance Metric Basic characteristics of a computer system that we need to measure: • A count of how many times an event occurs • The duration of some time interval • The size of some parameter
Performance Metric Characteristics of a good performance metric • Linearity • The metric should be linearly proportional to the machine performance • Reliability • Ex: MIPS is unreliable MIPSA > MIPSB but B may execute the program in less time • Repeatability (deterministic) • The same value is measured each time the same experiment is performed • Easiness of measurement • Consistency • Metric must be the same across different systems • Ex: MIPS/s and MFLOPS/s are not consistent • Independence • Manufacturers optimize performance of specific functions
Performance Metrics • Clock rate • MIPS/s • MFLOPS/s • SPEC • Execution time • Response time, throughput, bandwidth • Speedup
Indices of Central Tendency • Mean • Median • Mode
The Sample Mean (arithmetic mean or average) Samples {x1, x2, … xn} Sample mean xA = ( S xi ) /n Gives equal weight to all measurements. Ex: mean = 15.8
The Sample Median Reduces the skewing effect of outliers. • Order all n measurements • The middle value is the median. If n is even, median is the mean of the middle 2 values Ex: mean = 46.5 median=(16+18)2=17
The Sample Mode Mode is the value that occurs most frequently. If all values occur once, there is no mode. If there are several samples that all have the same value, there would be several modes.
Mean, Median, Mode Mean • Incorporates information from the entire measured values • Sensitive to outliers Median and Mode • Less sensitive to outliers • Do not effectively use all information Ex: • Arithmetic mean of execution times is meaningful • Arithmetic mean of Mflop/s rate calculated using these execution times is not meaningful
Mean, Median, Mode Ex: 25 machines contain 16 MBytes memory 38 machines contain 32 MBytes memory 4 machines contain 64 MBytes memory 1 machine contains 1024 MBytes memory • Total size of memory = 2896 meaningful • Mean = 2896 / 68 = 42.6 not meaningful because 63 of 68 machines have memory size ≤ 64Kbytes • Median = 32 Mbytes more meaningful Use the appropriate indices
Arithmetic Mean • Time measurements: T1, … Tn Arithmetic mean TA is directly proportional to the total execution time => correct mean to use • Execution rates (MFLOPS/s): M1, … Mn where Mi = F/Ti Arithmetic mean MA is directly proportional to the inverse of the execution time => inappropriate Arithmetic mean = S xi , 1 ≤ i ≤ n n
Harmonic Mean • Time measurements: T1, … Tn Harmonic mean TH is inappropriate • Execution rates (MFLOPS/s): M1, … Mn Arithmetic mean MH is inversely proportional to execution time => appropriate Harmonic mean = n , 1 ≤ i ≤ n ∑ 1/xi
Geometric Mean Appropriate to use with normalized numbers. Geometric mean is not appropriate to summarize times or rates irrespective of whether they are normalized. Geometric mean = ( xi )1/n , 1 ≤ i ≤ n
Weighted Mean Arithmetic mean xA,W = S wi .xi Harmonic mean xH,W = 1/ ( S wi /xi )
Histogram • Each cell contains 4 or 5 measurements
Range Difference of maximum and minimum values Rmax = maxi(xi) - mini(xi) does not use all information Difference of each measurement from the mean max = maxi xi - x does not use all information
Variance The sample variance is the calculated estimate of the actual variance of the underlying distribution from which the measurements are taken. S (xi - x)2 S2= ---------------- a better metric n-1
Standard Deviation S = S2= S (xi - x)2 / (n-1) Coefficient of variance (COV) = S / x
Quality of Measurement Depends of the following characteristics of the measurement tool: • Accuracy • Absolute difference between a measured value and the corresponding reference value. Ex: duration of a second • Precision • Highly precise measurements would be very tightly clustered around a single value • Resolution • Smallest increment change that can be detected. Ex: interval between clock ticks
Quality of Measurement accuracy precision mean value true value
Errors Sources of errors • Accuracy, precision, resolution of the measurement tool • Time required to read and store the current time value • Time-sharing among multiple programs • Processing of interrupts • Cache misses, page faults
Errors Types of errors • Systematic errors • Are the result of some experimental mistake • Usually constant across all measurements Ex: temperature may effect clock period • Random errors • Unpredictable, nondeterministic • Effect the precision of measurement Ex: timer resolution ±T , measurements vary by ±T with equal probability
Errors Experimental errors are Gaussian Ex:x – mean value E – random error Two sources of errors, each has 50% probability Measurements are: x ± 2E with probability 50% have error x with probability 50% correct
Confidence Intervals Used to find a range of values that has a given probability of including the actual value. Case 1: number of measurements is large (n≥30) {x1, x2, … xn} - Samples x – sample mean Gaussian distribution with: m – mean s – standard deviation
Confidence Intervals Central-limit Theorem Confidence interval: [c1, c2] • Used to quantify the precision of the measurements. Pr [c1, ≤ x ≤ c2] = 1- Pr [x ≤ c1] = Pr [x > c2] = /2
Confidence Intervals Normalization: x - x z = ---------- Gaussian distribution with m = 0, s2 = 1 s /√n x – sample mean x – actual mean
Confidence Intervals Central-limit Theorem Confidence interval: [c1, c2] c1 = x – z1-a/2 (s/√n) c2 = x + z1-a/2 (s/√n) s – standard deviation n – number of measurements z1-a/2 – value of a standard unit normal distribution that has area of 1-a/2 to the left of z1-a/2 . Pr[ Z ≤ z1-a/2 ] = 1-a/2 z1-a/2 is obtained from precomputed table.
Confidence Intervals Case 2: number of measurements is small (n<30) s2 can vary significantly from group to group x - x z = ---------- follows T distribution s /√n
Confidence Intervals c1 = x – t1-a/2; n-1 (s/√n) c2 = x + t1-a/2; n-1 (s/√n) t1-a/2; n-1 – value from t distribution that has an area of 1-a/2 to the left of t1-a/2; n-1 . t1-a/2; n-1 is obtained from precomputed table.
Confidence Intervals T distribution How long does it take to write a file of a particular size to disk?
Confidence Intervals Ex: How long does it take to write a file of a particular size to disk?
Confidence Intervals Determining the Number of Measurements Needed Size of the interval [c1, c2] is proportional to 1/√n (c1, c2) = ((1-e)x, (1+e)x) c1 = (1-e)x = x – z1-a/2 (s/√n) n = ( (z1-a/2 .s) / e.x )2 Initially make small number of measurements and find an estimate for s, then calculate n.
Confidence Intervals Ex: How many measurements are required for 90% confidence that the mean value is within 7% of the actual value?
Confidence Intervals Confidence Intervals Proportions Finding a confidence interval for the proportion p. Sample proportion: p = m / n n – total number of events m – number of times the desired outcome occurs out of n events If np ≥ 10, binomial distribution approximates Gaussian distribution with mean p and variance p(1-p)/n
Confidence Intervals Confidence interval for proportion p: c1 = p – z1-a/2 √ p(1-p)/n) c2 = p + z1-a/2 √ p(1-p)/n)
Confidence Intervals Ex: In a multitasking operating system, how much time the processor spends executing the operating system compared to how much time it spends executing the users’ applications? The system interrupts processor every 10ms. Interrupt service routine maintains 2 counters: • Increments n every time the routine is executed • Increments m if OS was executing when the interrupt occurred Running the experiment for 1 minute results: m=658, n=6000
Confidence Intervals Determining the Number of Measurements Needed c1 = (1-e)p = p – z1-a/2 √ p(1-p)/n) n = (z1-a/2 .p(1-p)) / (e.p)2 Initially make small number of measurements to estimate p.
Confidence Intervals Ex: How long must the above experiment be run to know with 95% confidence the fraction of time the processor spends executing the OS with a range of ±5%?
Confidence Intervals Normalizing Data for Confidence Intervals Confidence interval is an indication of the precision of measuring process, not its accuracy. If the error distribution in the measuring process is not Gaussian, then the data must be normalized. Normalization: • Find the arithmetic mean of 4 or more randomly selected measurements • Apply confidence intervals to the overall mean of these averaged values (central-limit theorem assures that these averaged values follow Gaussian distribution)
Confidence Intervals Ex: If the duration of an event is too short Measure the duration of mj repetitions. xj = Tj/mj Repeat the experiment n times x1, x2, … xn Apply confidence interval formula to these n means. After aggregating the short events, confidence intervals can only provided for the mean value of the aggregated event, not individual events.