Analysis of Variance

Analysis of Variance Harry R. Erwin, PhD School of Computing and Technology University of Sunderland

Resources • Crawley, MJ (2005) Statistics: An Introduction Using R. Wiley. • Gonick, L., and Woollcott Smith (1993) A Cartoon Guide to Statistics. HarperResource (for fun).

When is Anova Used? • All explanatory variables are categorical—unquantified and unordered • The explanatory variables are called ‘factors’; each has two or more levels. • If there is one factor with two levels, use Student’s t. • If there is one factor with three+ levels, use one-way Anova. • If there are two factors, use two-way Anova. • For three factors, use three-way Anova, and so on… • If every combination of factors is present, you have a factorial design, allowing you to study interactions between variables (and order no longer matters!).

The Basic Idea of Anova • You compare means by comparing variances. (Picture) • Compute the overall variance of the data: • s2 = sum of squares (SSY)/degrees of freedom (kn-1) • If the treatment means are significantly different, the sum of squares computed from the individual treatment means will be smaller than the sum of squares computed from the overall mean. • SSE = computed from the individual treatment means. degrees of freedom = k (n-1) • SSA = SSY-SSE, df =k-1. • Finally, use an F test to determine if the SSA is significant.

Anova Tools • model<-aov(y~A) • summary(model) • Tells you whether the SSA is significant • plot(model) • checks for constancy of variance and normality of errors. • Demonstration (155-161)

Demonstration oneway<-read.table("oneway.txt",header=T) attach(oneway) names(oneway) [1] "ozone" "garden” plot(1:20,ozone,ylim=c(0,8),ylab="y",xlab="order") abline(mean(ozone),0) for(i in 1:20)lines(c(i,i),c(mean(ozone),ozone[i]))

Variance from Mean

Separating the two gardens

Analysis summary(aov(ozone~garden)) Df Sum Sq Mean Sq F value Pr(>F) garden 1 20.0000 20.0000 15 0.001115 ** Residuals 18 24.0000 1.3333 <- residual mean square plot(aov(ozone~garden)) <- similar to lm plots, less informative

Investigating Factor Levels • summary.aov() allows you to do hypothesis testing. • What is more interesting (usually) than hypothesis testing are the effects of factor levels. • That uses summary.lm() from the regression lecture • summary.lm(aov(ozone~garden)) • Discuss (pages 164-166 of text)

Plotting ANOVA • Box and whisker plots • Show the nature of the variation within each treatment. • Show skew. • Bar plots with error bars • Preferred by many journals • Demo (pages 168-169) • Show when means are significantly different.

Factorial Experiments • All combinations of factors present. Highly desirable. • Allow us to investigate interactions. • summary() • summary.lm() • Demo of simplification.

Pseudoreplication • aov and lme can handle complicated error structures. • Avoid two kinds of pseudoreplication: • Nested sampling • Split-plot analysis • You can average away spatial pseudoreplication and conduct individual ANOVAs for each time. • Has some weaknesses (page 180)

Random Effects and Nested Designs • Mixed effects models: both fixed (affecting the mean) and random (affecting the variance) effects in the explanatory variables. • Affected by grouping. • Page 179 for categorisation.

Longitudinal Data • Repeated measurements from an individual • Common in medical studies • Be critical! Can separate age effects from cohort effects. • Response is a measurement series.

Derived Variable Analysis • Summarise the statistics to average away the pseudoreplication and analyse the statistics. • Weak if explanatory variable change over time. • Watch for variation in: • random effects (VCA) • serial correlation • measurement error

Analysis of Variance