350 likes | 547 Views
Statistics. CSE 807. Experimental Design and Analysis. How to: Design a proper set of experiments for measurement or simulation. Develop a model that best describes the data obtained. Estimate the contribution of each alternative to the performance. Isolate the measurement errors.
E N D
Statistics CSE 807
Experimental Design and Analysis How to: • Design a proper set of experiments for measurement or simulation. • Develop a model that best describes the data obtained. • Estimate the contribution of each alternative to the performance. • Isolate the measurement errors. • Estimate confidence intervals for model parameters. • Check if the alternatives are significantly different. • Check if the model is adequate.
Example • Personal workstation design. • Processor:68000, Z80, or 8086. • Memory size: 512K, 2M, or 8M bytes. • Number of Disks: One, two, three, or four. • Workload: Secretarial, managerial, or scientific. • User education: High school, college, or Post-graduate level.
Terminology • Response Variable: Outcome. E.g., throughput, response time. • Factors: Variables that affect the response variable. E.g., CPU type, memory size, number of disk drivers, workload used, and user’s educational level. Also called predictor variables or predictors. • Levels: The value that a factor can assume. E.g., the CPU type has three levels: 68000, 8080, or Z80. # of disk drives has four levels. Also called treatment.
Terminology (cont’d) • Primary Factors: The factors whose effects need to be quantified. E.g., CPU type, memory size only, and number of disk drives. • Secondary Factors: “Factors whose impact need not be quantified. E.g., the work loads. • Replication: Repetition of all or some experiments.
Terminology (cont’d) • Design: The number of experiments, the factor level and number of replications for each experiment. E.g., Full Factorial design with 5 replications: 3 X 3 X 4 X 3 X 3 or 324 experiments, each repeated five times. • Experimental Unit: Any entity that is used for experiments. E.g., users. Generally, no interest in comparing the units. Goal - minimize the impact of variation among the units.
Terminology (cont’d) • Interaction => Effect of one factor depends upon the level of the other. Non-interacting Factors Interacting Factors
Common Mistakes in Experimentation 1. The variation due to experimental error is ignored. 2. Important parameters are not controlled. 3. Effects of different factors are not isolated. 4. Simple one-factor-at-a-time designs are used 5. Interactions are ignored. 6. Too many experiments are conducted. Better: two phases.
Types of Experimental Designs • Simple Designs: Vary one factor at a time • #of Experiments = Not statistically efficient. Wrong conclusions if the factors have interaction. Not recommended.
Types of Experimental Designs (cont’d) • Full Factorial Design: All combinations. • # of Experiments = Can find the effect of all factors. Too much time and money. May try 2k design first
Types of Experimental Designs (cont’d) • Fractional Factorial Designs: Save time and expense. Less information. May not get all interactions. Not a problem if negligible interactions.
Exercise • The performance of a System being designed depends upon the following three factors: a. CPU type: 68000, 8086, 80286 b. Operating System type: CPM, MS-DOS, UNIX c. Disk drive type: A, B, C How many experiments are required to analyze the performance if a. There is significant interaction among factors. b. There is no interaction among factors c. The interactions are small compared to main effects.
2k Factorial Designs • k factors, each at two levels. • Easy to analyze. • Helps in sorting out impact of factors. • Good at the beginning of study. • Valid only if the effect is unidirectional. E.g., memory size, the number of disk drives
Cache Size Memory size 4M Bytes 16M Bytes 1K 2K 15 25 45 75 22 Factorial Designs • Two factors, each at two levels Performance in MIPS -1 if 4M bytes memory 1 if 16M bytes memory -1 if 1M bytes cache 1 if 2M bytes cache xA= xB=
Model y = q0 + qAxA + qBxB +qABxAxB 15= q0 - qA - qB + qAB 45= q0 + qA - qB - qAB 25= q0 - qA + qB - qAB 75= q0 + qA + qB + qAB y = 40 + 20xA + 10xB + 5xAxB Interpretation: Mean performance = 40 MIPS Effect of memory = 20 MIPS Effect cache = 10 MIPS Interaction between memory and cache = 5 MIPS
Computation of Effects Model: y = q0 + qAxA + qBxB +qABxAxB Substitution: y1 = q0 - qA - qB + qAB y2 = q0 + qA - qB - qAB y3 = q0 - qA + qB - qAB y4 = q0 + qA + qB + qAB
Computation of Effects (cont’d) Solution: q0 =1/4 (y1 + y2 + y3 + y4) qA =1/4 (-y1 + y2 - y3 + y4) qB =1/4 (-y1 - y2 + y3 + y4) qAB =1/4 (y1 - y2 - y3 + y4) Notice that effects are linear combinations of responses. Sum of the coefficients is zero => contrasts. Notice: qA = Column A x Column y qB = Column B x Column y qAB = Column A x Column B x Column y
Allocation of Variation • Importance of a factor = proportion of the variation explained • Sample variance of • Variation of y Numerator = sum of squares total (SST)
Allocation of Variation (cont’d) For a 22 design: Variation due to Variation due to Variation due to interaction SST = SSA + SSB + SSAB Fraction explained by Variation Variance
Derivation Model: yi = q0 + qAxAi + qBxBi +qABxAixBi Notice 1. The sum of entries in each column is zero: 2. The sum of the squares of entries in each column is 4:
Derivation (cont’d) • 3. The columns are orthogonal (inner product of any two columns is zero):
Derivation (cont’d) Sample mean
Derivation (cont’d) Variation of y Product terms
Example Memory-cache study: Total Variation Total variation = 2100 Variation due to memory = 1600 (76%) Variation due to cache = 400 (19%) Variation due to interaction = 100 (5%)
Case Study: Interconnection Net Memory interconnection networks: Omega and Crossbar. Memory reference patterns: random and Matrix Fixed factors: 1. Number of processors was fixed at 16. 2. Queued requests were not buffered but blocked. 3. Circuit switching instead of packet switching. 4. Random arbitration instead of round robin. 5. Infinite interleaving of memory => no memory back contention.
22 Design for Interconnection Networks Factors Used in the Interconnection Network Study Level Response
Para- meter Mean Estimate Variation Explained T N R T N R q0 qA qB qAB 0.5725 0.0595 -0.1257 -0.0346 3.5 -0.5 1.0 0.0 1.871 -0.145 0.413 0.051 17.2% 77.0% 5.8% 20% 80% 0% 10.9% 87.8% 1.3% Interconnection Network Study (cont’d)
Interpretation of Results • Average throughput = 0.5725 • Most effective factor = B = reference pattern => The address patterns chosen are very different. • Reference pattern explains 0.1257 (77%) of variation • Effect of network type = 0.0595 Omega networks = Average + 0.0595 Crossbar networks = Average - 0.0595 Difference between the two = 0.119 • Slight interaction (0.0346) between reference pattern and network type.
General 2k Factorial Designs k factors at two levels each. 2kexperiments. 2keffects: k main effects Two factor interactions Three factor interactions...
2k Design Example Three factors in designing a machine: Cache size Memory size Number of processors
Cache Size 4M Bytes 16M Bytes 1 Proc 2 Proc 1 Proc 2 Proc 1K Byte 2K Byte 14 10 46 50 22 34 58 86 2k Design Example (cont’d)
=18%+4%+71%+4%+1%+2%+0% =100% Number of Processors (C) is the most important factor Analysis
A1 A2 C1 C2 C1 C2 B1 B2 100 40 15 30 120 20 10 50 Exercise Analyze the 23 design: a. Quantify main effects and all interactions. b. Quantify percentages of variation explained. c. Sort the variables in the order of decreasing importance