1.13k likes | 1.72k Views
Analysis of Variance (ANOVA) Part I. Why study Analysis of Variance (ANOVA)?. ANOVA technology was developed by R. A. Fisher for use in the agricultural trials being run at the Rothamstad Agricultural Research Station where he was employed for a period of time.
E N D
Why study Analysis of Variance (ANOVA)? • ANOVA technology was developed by R. A. Fisher for use in the agricultural trials being run at the Rothamstad Agricultural Research Station where he was employed for a period of time. • The research was concerned with testing the effects of different types of organic and inorganic fertilizers on the crop yield. • For this reason, many terms used to describe ANOVA models still reflect the terminology used in the agricultural settings (e.g., split-plot designs).
Why study Analysis of Variance (ANOVA)? • ANOVA methods were specifically developed for the analysis of data from experimental studies. • The models assume that random assignment has taken place as a means of assuring that the IID assumptions of the statistical test are met. • These designs are still the staple of certain social science disciplines where much experimental research is conducted (e.g., psychology).
Why study Analysis of Variance (ANOVA)? • But what is the utility for other disciplines? • There are several reasons why knowledge of ANOVA is important: • Clinical trials in all areas of research involving human participants still employ these designs. • Many of the techniques employed in testing hypotheses in the ANOVA context are generalizable to other statistical methods. • There are other statistical procedures, e.g., variance components models that use a similar approach to variance decomposition as is employed in ANOVA analysis and can be more readily understood with an understanding of ANOVA. • Possible involvement in interdisciplinary research.
Overview of ANOVA • ANOVA models distinguish between independent variables (IVs), dependent variables (DVs), blocking variables, and levels of those variables. • All IVs and Blocking variables are categorical. DVs are continuous and IID normally. • We have already discussed IVs and DVs. • Blocking variables are variables needing to be controlled in an analysis that are not manipulated by the researchers and hence not true IV’s (e.g., gender, grade in school, etc.).
“Levels” of IVs and Blocking Variables • Each IV and Blocking variable in an ANOVA model must have two or more “levels.” • Example: the independent variable may be a type of therapy, a drug, the induction of anger or frustration or some other experimental manipulation. • Levels could include different types of therapies or therapies of differing degrees of intensity, different doses of a drug, different induction procedures, etc. • It is assumed that participants are randomly assigned to levels of the IV but not to levels of any blocking variables.
Overview of ANOVA • IVs and Blocking variables are referred to as “Factors” and each factor is assumed to have two or more levels. • In ANOVA we distinguish between between groups and within groups factors.
Between vs. Within Factors • A between groups factor is one in which each subject appears at only one level of the IV. • A within groups factor (of which a repeated measures factor is an example), is one in which each subject appears at each level of the IV. • It is possible to have a design with a mixture of between and within groups factors or effects.
Fixed vs. Random Effects: Expected Mean Squares (EMS) • Effects or factors are fixed when all levels of an IV or blocking variable we are interested in generalizing to are included in the analysis. • Effects are random when they represent a sampling of levels from the universe of possible values.
Examples of Random and Fixed Effects • Drug dosages of 2, 4, or 10 mg • random:since not all levels are represented. • Different raters providing observational ratings of behavior • random • Gender - male and female • fixed • MST treatment versus usual services • fixed
Fixed vs. Random Effects (cont.) • The distinction between fixed and random effects is important since it has implications for the way in which the treatment effects are estimated and the generalizability of the results
For fixed effect models, we have complete information (all levels of a variable or factor are observed) and we can calculate the effect of the treatment by taking the average across the groups present. • In the case of random factors, we have incomplete information, (not all levels of the factor are included in the design). For random factors, we are estimating the treatment effect at the level of the population given only the information available from the levels we included in our design. The formulas are designed to represent this uncertainty.
Fixed versus Random Effects (cont.) • In the case of a fixed effect, we can generalize the results only to the levels of the variables included in our analyses. • Random effects assume that the results will be generalized to other levels between the endpoint values included in our analyses.
“Levels” of IVs and Blocking Variables • The ANOVA model is a “means model” • i.e., it assumes that any observed differences in behavior can be completely described using only information in the means. • The ANOVA model is also a population-averaged model. • It evaluates the effects of treatments and blocking variables at the group rather than at the individual level.
Hypothesis Testing • ANOVA involves the same 4-step hypothesis testing procedure we applied in the case of the z-test, t-tests, and tests for the correlation coefficients. • We will, however, use a different sampling distribution to determine the critical values of our statistic. • This sampling distribution is called the F-distributionand the significance test is now called an F-test.
F-test Basics • F-statistics are formed as the ratio of two chi-square distributions divided by their respective degrees of freedom: χ2/d.f.1 F(1,d.f.Denom) = t2 = -------------- χ2/d.f.2 • As a result, unlike the t-distribution the shape of which was determined by one degree of freedom parameter, the F-distribution is determined by two degree of freedom parameters. • When there are only 2 groups, F is equal to t2.
The F Statistic (cont.) • A very important property of the F-test under the null hypothesis of no differences between the groups is that, in theory, the numerator and denominator are independent estimates of the same population variance. • However, the denominator measures only “error” or “noise” while the numerator measures both error and treatment effect. • Under the null hypothesis of no effect of treatment, the expected value of the F-statistic is 1.0. As the treatment effect increases in size, F becomes greater than 1.0 • Note: Although in theory F should never be less than 1.0, with “real” data it will fall below 1.0 at times.
Error Term • The error term in the denominator of the F-statistic is an extension of the two sample t-test error term. • In the two sample t-test we saw that since two independent estimates of the population variance were available - one from each sample - we could improve on the estimate of the population parameter by averaging across the two estimates. The error term which resulted was called a “pooled error term.” • In ANOVA, we will have at least two but possibly three or more groups. Regardless, the process is the same. To improve on our estimate of the population parameter, we pool the variance estimates together - one from each cell or sample - and use this mean squared error as the error term in our F-test.
Tabled Values of F • The critical values in the table for the F-statistic are non-normally distributed and include only the values at the upper tail of the distribution. • Lower values can be obtained by taking the reciprocal of the tabled value of F, i.e., 1/F but these values are rarely used. • The F-distribution changes shape depending on the numerator and denominator degrees of freedom as can be seen in the next slide:
ANOVA Hypotheses • Ho: μ1 = μ2 = μ3 • H1: μ1 ≠ μ2 ≠ μ3 • As with the other statistics covered in this course, the F-test can be run working from definitional formulas or computational formulas. • We will work through the definitional formulas in class examples. • One reason for this is that it is easy to calculate the statistic using these formulas. More importantly, it is easier to see what each component of the statistic represents.
One-Way ANOVA Design Model • Simplest ANOVA model – single factor: Yij = μ + αj + εij • Where i = person and j=group. • This model says that each person's score can be described by: • μ, the overall or grand mean • αj, an average group level treatment effect, and • εij, a parameter describing each individual's deviation from the average group effect.
The F-statistic • For all ANOVA designs, the F-statistic is comprised of two parts: • Numerator: a component known as the “mean square between groups” • Denominator: “mean square within groups” • We form a ratio of these two variance terms to compute the F statistic: MSBG SSBG / d.f.BG F = --------- = --------------- MSWG SSWG / d.f.WG
The F-statistic (cont.) • The trick is to correctly estimate the variance components forming the numerator and denominator for different combinations of fixed and random effects. • The correct formulas to use are based on the statistical theory underlying ANOVA and are derived using what are termed “expected mean square” formulas.
Components of the F-Statistic • We define the numerator and denominator of the F-test in terms of their expected values (the values that one should obtain in the population if Ho were true). • The expected mean squares for the two types of effects in the CR-p design are given by: Model I Model II (Fixed effect) (Random effect) _____________________________________________ MSBG = σε2 + nΣαj2 / (p - 1) σε2 + n(1 - p/P)σα2 MSWG = σε2σε2 ________________________________________
Forming the F-statistics for the two possible design models: σε2 + nΣαj2 / (p - 1) F (Model 1) = ---------------------------- σε2 σε2 + n(1 - p/P)σα2 F (Model 2) = ---------------------------- σε2
Calculating the Variance Components J _ _ SSBG = n Σ (X.j - X..)2 j=1 J I _ J ^ SSWG = Σ Σ (Xij - X.j)2 or Σs2(n-1) j=1i=1j=1 MSBG = SSBG/d.f.BG MSWG = SSWG/d.f.WG
Variance Components (cont). • If we add the SSBG and SSWG terms together, we have the Total Sums of Squares or SSTOT: SSTOT = SSBG+SSWG = ΣΣ(Yij – Y..)2 • The ANOVA process represents a “decomposition” of the total variation in a DV into “components” of variation attributable to the factors included in the model and a residual or error component. • In the CR-p design, SSBG is the variance in the DV that can be “explained” by the IV and SSWG is the error, residual, or unexplained variation left over after accounting for SSBG.
Assumptions • ANOVA makes the same assumptions as the t-test. • F assumptions: • The dependent variable is from a population that is normally distributed. • The sample observations are random samples from the population. • The numerator and denominator of the F-test are estimates of the same population variance. • The numerator and denominator are independent.
Assumptions (cont.) • Model Assumptions: • The model equation (design model) reflects all sources of variation affecting a DV and each score is the sum of several components. • The experiment contains all treatment levels of interest • The error effect is independent of all other errors and is normally distributed within each treatment population with a mean equal to 0 and variance equal to σ2 (homogeneity of variances assumption)
Assumptions (cont.) • Note that the assumptions of ANOVA are frequently violated to at least some extent with real-world data. • Generally the violations have little effect on significance or power if: • The data are derived through random sampling • The sample size is not small • Departures from normality are not large • n’s within each cell are equal.
Violations of Assumptions • The F-test is robust to violations of the homogeneity of variances assumption. When the violation is extreme, however, the result is an incorrect Type I error rate (It will become increasingly inflated as the violation becomes more severe). • As we noted with the t-test, if the group sizes are equal, the impact of heterogeneity is of less concern.
Violations of Assumptions • The effect on the error term will be liberal when the largest cell sizes are associated with the smallest variances and conservative if the largest cell sizes are associated with the largest variances. • In most cases, problems arise when the cell or group sizes are very different.
Normalizing Transformations • When the data are more severely skewed, kurtotic, or both, or the homogenity of variances assumption has been violated, it is sometimes necessary to apply a normalizing transformation to the DV to allow the analysis to be run.
Transformations • Normalizing transformations help to accomplish several things: • Homogeneity of error variances • Normality of errors • Additivity of effects, i.e., effects that do not interact (as is desirable, for example, in a repeated measures ANOVA design). By transforming the scale of measurement, additivity can sometimes be achieved. • We will talk about additivity more in the context of repeated measures ANOVA.
Types of Normalizing Transformations • Square Root • Y'=√Y • Use when treatment level means and variances are proportional (moderate positive skew)
Types of Normalizing Transformations (Cont.) • Log 10 • Y'=log10(Y+1) • Use when Tx means and SDs are proportional(more extreme positive skew)
Types of Normalizing Transformations (Cont.) • Angular or Inverse Sine • Y'=2*arcsin(√Y) • Use when means and variances are proportional andunderlying distribution is binomial. • Often used when the DV represents proportions.
Types of Normalizing Transformations (Cont.) • Inverse or Reciprocal • Y'=1/Y • Use when squares of Tx means are proportional to SDs(severe positive skew or L-shaped distribution)
Negatively Skewed Distributions • When the distribution is negatively skewed, one must first "reflex" the distribution and then apply the correct transformation. To reflex a set of scores, simply subtract each value from the highest value plus one unit. • e.g. an item is on a 1-5 scale, and we wish to reflex it (change it to a 5-1 scale) we would simply subtract each value from 6: Reflexed(y) = 6-y
Selecting a Transformation • If you are unsure which transformation will work best, you can apply all possible transformations to the highest and lowest scores in each treatment level. • Calculate the range of values for each treatment level by subtracting the smallest score from the largest. • Form a ratio of the largest and smallest ranges for each transformation across Tx levels. • The transformation associated with the smallest ratio wins. • (See Kirk Experimental Design for an example)
Additivity • If addititivity is of interest, use a test for nonadditivity such as that developed by Tukey (1949) and covered in many statistics texts. • Then select a transformation that reduces nonadditivity to an acceptable level.
Example of a Completely Randomized Between Groups Design (CR-5) • One independent variable with five levels • The independent variable represents different types of stranger awareness training and the dependent variable the latency of the children to protest verbally about a stranger’s actions (measured in seconds).
Example (cont.) Experimenter Mother Mother Role Control verbal verbal natural play __________________________________________________________ Mean 279 284 286 308 330 SD 50 53 51 56 58 n 10 10 10 10 10 __________________________________________________________ Y.. = 297.4 Can eyeball the SD’s relative to the sample sizes to see if the homogeneity assumption has been violated.
CR-5 Example (cont.) • Ho: μ1 = μ2 = μ3 = μ4= μ5 • H1: μ1 ≠ μ2 ≠ μ3 ≠ μ4 ≠ μ5 • Use ~F(4,45) • Where dfnum = p-1 and dfden = p(n-1) • Set up decision rules (from F-table): • Fcrit(.05,4,45) = 2.61 • If Fobs > Fcrit then reject Ho
Formulas • Calculate statistic and apply decision rules: J ^ SSWG = Σ S2(n-1) = [(50)2+(53)2+(51)2+(56)2+(58)2]*9=129,690 j=1 J _ _ SSBG = nΣ (Xj - X..)2 j=1 = (10)(279-297.4)2+(284-297.4)2+(286-297.4)2+ (308-297.4)2+(330-297.4)2 = (10)1823.2 = 18,232 SSTOT = SSBG+SSWG = 129,690+18,232 = 147,992
Formulas (cont.) d.f.BG = p-1 = 5-1= 4 d.f.WG = p(n-1) = 5(10-1)= 45 MSBG = SSBG/d.f.BG = 18,232/4 = 4558 MSWG = SSWG/d.f.WG = 129,690/45 = 2882
Formulas (cont.) MSBG F = ------ MSWG = 4558 / 2882 = 1.58 Fcrit(.05,4,45) = 2.61 Since F-obs < F-crit do not reject Ho. Conclude no effect due to treatment
What if F was significant? • If you found that there was a significant difference between means using the F-statistic, you still do not know which of the groups were significantly different from the others. • This is because the F-test is a simultaneous or omnibus test of the difference between all possible combinations of group means. • In the case of the two-sample t-test, this was easy to resolve by looking at the group means. In the case of three or more groups, this is not as easy to determine. • For this reason, it is necessary to introduce what are known as post-hoc tests as a follow-up to the F-test.