330 likes | 738 Views
Statistical Analysis. Professor Lynne Stokes Department of Statistical Science Lecture 4 Fundamental Statistical Concepts and Methods, Statistical Design Principles. Experiment. Problem Formulation. Data. Inference. Observation.
E N D
Statistical Analysis Professor Lynne Stokes Department of Statistical Science Lecture 4 Fundamental Statistical Concepts and Methods, Statistical Design Principles
Experiment Problem Formulation Data Inference Observation Figure 1.1 Critical stages of statistical input in scientific investigations.
Experimental Observations are Only Experience Carefully Planned in Advance R.A. Fisher, The Design of Experiments
ESL programs: Reading scores How should the experiment be designed for statistical optimality ? • 40 classrooms 8 schools • Immersion vs. bilingual • New curriculum vs. old • 40 teachers • Boys and girls in each classroom • Individual differences among children (IQ, SES, etc.)
Statistical Experimental Design Principles • Systematically change known, controllableinfluences on the response • Factor Effects, determined by the statistical design • Control known, extraneous sources of variability • Block Designs • Measure known, uncontrollableinfluences on the response • Quantitative Covariates • Estimate the effects of all sources of variability • Fixed and Random factor effects • Estimate experimental variability • Random effects, experimental error and measurement error
Experimental Design Terminology • Block : Group of homogeneous experimental units or test runs • Confounding : One or more effects that cannot unambiguously be attributed to a single factor or interaction • Covariate : Uncontrollable variable that is believed to influence the response but is not affected by experimental factors • Design(layout) : Complete specification of test runs, including the levels of the factors, blocking, repeat test runs, and randomization • Experimental Region (factor space) : All possible factor-level combinations for which experimentation is possible • Factor : Controllable experimental variable believed to influence the response • Homogeneous Experimental Units : Units that are as uniform as possible on all characteristics that might influence the response • Interaction : Existence of joint factor effects in which the effect of each factor depends on the levels of the other factors • Levels : Specific values of a factor • Repeat Tests : Two or more test runs with the same factor-level combinations, taken under homogeneous experimental conditions • Replication : Repetition of an entire experiment or a portion of an experiment under two or more sets of non-homogeneous conditions • Response : Outcome or result of a test run • Test Run : Single combination of factor levels that yields an observation on the response • Unit , Item : Physical entity on which a measurement or observation is made; sometimes refers to the actual measurement or observation MGH Table 4.1
Response Repeat Tests Factors Levels Test Runs Factor-Level Combination Pilot Plant Experiment Investigate the Effects of Plant Conditions MGH Table 6.1
Experimental Units Physical Entities Bolts, Nuts Fields on a Farm Airplane Wings, ...
Requirements of Good Experiments • Absence of Systematic Error • Controls (Factors, Covariates) • Type of design, randomization • Precision • Blocking, choice of experimental units • Type of design, replication, repeat tests • Range of Validity • Choice of factors, levels • Choice of experimental units • Simplicity • Design, analysis • Estimation • Effects • Uncertainty D.R. Cox, Planning of Experiments
Lawnmower Stopping Times Overall Variation X XX X X X X X X X X X XX X X X X X X XX X X X 150 175 200 225 250 275 300 Cutoff Times (.01 sec) Standard Deviation Measures Variability From Several Sources Biased for Random Uncertainty
Manufacturer ComparisonRepeat Test Variation Blocking on Lawnmowers (6) and Speeds (2) H H L L H H L L L L HH HH L LL L L H H H 150 175 200 225 250 275 300 Cutoff Times (.01 sec) Pooled Standard Deviation Measures Only Repeat Test Variation Unbiased for Random Uncertainly
Common Experimental Design Problems • Response variation can mask factor effects MGH Table 4.3
Level 1: Experimental Factor Level 2: Case 1 Average Average Case 2 Average Average Figure 4.3 Experimental variability and factor effects
Implications • Greater variation requires larger experiment sizes • Greater variation requires designs which control major sources of variation • Design to control variability through blocking and the choice of factors
Common Experimental Design Problems • Response variation can mask factor effects • Variation in uncontrolled factors can compromise conclusions
Common Design Problems :Uncontrolled Factors Reading score Confounded Factor Effects IQ
Common Design Problems :Uncontrolled Factors IQ scores Immersion/Old Bilingual/Old 101 98 110 82 99 115 90 92 100 92 98 115 120 105 105 87 103 104 97 95 96 103 t-Test : Means significantly different Average
Implications • Include controllable factors in the design that are known to or might influence the response • Measure uncontrollable covariates that are known to or might influence the response
Common Experimental Design Problems • Response variation can mask factor effects • Variation in uncontrolled factors can compromise conclusions • Erroneous principles of efficiency can lead to poor choice of designs • Inexpensive factor changes results in more “data” • Can’t isolate effects of factors when more than one factor is changed at a time
Common Design Problems : Erroneous Principles of Efficiency • Change factor levels in the most convenient manner, time-wise or budget-wise • Test many levels of inexpensive factors, few levels of expensive ones • Run duplicate tests (if any) back-to-back
Perceived Advantages to One-Factor-at-a-Time Testing • One Factor at a Time (OFAT) • Fix the levels of all but one factor • Vary the levels of the factor • Identify the Optimum • Fix the factor at Its Optimum, fix the levels of all but one of the remaining factors • Repeat Steps 2-4 until all factors have been tested
Perceived Advantages to One-Factor-at-a-Time Testing (con’t) • Number of Test Runs Close to Minimum • Ignores efficiency of testing several factors simultaneously • Quickly Assess Each Factor as It is Tested • Can get quick information for screening purposes • Can’t adequately assess joint factor effects
Figure 4.6 Contours of constant yield (%) of a chemical reaction. 320 20 25 300 30 35 280 40 65 45 Temperature (oC) 260 60 55 50 240 220 200 4 6 8 10 12 14 16 18 Reaction time (min)
Complete Factorial in Three Factors:Each Having Three Levels 3 Factor C 2 Factor Space 1 3 2 1 Factor B 1 2 3 Factor A
Complete Factorial in Three Factors:Each Having Three Levels 3 Factor C 2 1 3 2 1 Factor B 1 2 3 Factor A
OFAT : Middle Levels Selected as Optimal 3 Factor C 2 1 3 2 1 Factor B 1 2 3 Factor A
OFAT : Highest Levels Selected as Optimal 3 Factor C 2 1 3 2 1 Factor B 1 2 3 Factor A
One-Third Fractional Factorial 3 Factor C 2 1 3 2 1 Factor B 1 2 3 Factor A
Design Efficiency :Suspended Particulate Study n=16 90 Flow Rate 60 60 45 30 Pipe Angle 15 1 2 Test Fluid MGH Table 4.2
Design Efficiency :Suspended Particulate Study n=16 Number of Test Runs Factor Level Test Fluid Flow Rate Pipe Angle 1 2 60 90 15 30 45 60 8 8 8 8 4 4 4 4
Design Efficiency :Suspended Particulate Study n=16 Number of Test Runs Factor Level Test Fluid Flow Rate Pipe Angle 1 2 60 90 15 30 45 60 8 8 8 8 4 4 4 4 Equivalent Single-Factor Expt. Size 32-48
Implications • Design experiments with known, accepted statistical properties to eliminate bias and control variability • Design experiments with known efficiency properties when a complete, comprehensive experiment cannot be conducted • Know the properties of any design recommended for use in an experiment