Analysis of Variance (ANOVA)

Analysis of Variance (ANOVA) W&W, Chapter 10

The Results Many other factors may determine the salary level, such as GPA. The dean decides to collect new data selecting one student randomly from each major with the following average grades.

New data AverageAccountingMarketingFinanceM(b) A+ 41 45 51 M(b1)=45.67 A 36 38 45 M(b2)=39.67 B+ 27 33 31 M(b3)=30.83 B 32 29 35 M(b4)=32 C+ 26 31 32 M(b5)=29.67 C 23 25 27 M(b6)=25 M(t)1=30.83 M(t)2=33.5 M(t)3=36.83  = 33.72

Randomized Block Design Now the data in the 3 samples are not independent, they are matched by GPA levels. Just like before, matched samples are superior to unmatched samples because they provide more information. In this case, we have added a factor that may account for some of the SSE.

Two way ANOVA Now SS(total) = SST + SSB + SSE Where SSB = the variability among blocks, where a block is a matched group of observations from each of the populations We can calculate a two-way ANOVA to test our null hypothesis.

The Hypotheses Ho: 1 =2 = 3 HA: 1 2  3 We are testing the same hypothesis as in the completely randomized design.

Calculating SST SST = b(M(t)j - )2 where b = the number of blocks M(t)j = the mean for each sample  = grand mean

Calculating SST SST = (6)(30.83-33.72)2 + (6)(33.5-33.72)2 + (6)(36.83-33.72)2 = 108.4 This captures the variation across our samples (majors).

Calculating SSB SSB = k(M(b)i -)2 where k = the number of samples M(b)i = the mean for each block  = grand mean

Calculating SSB SSB = (3)(45.67-33.72)2 + (3)(39.67-33.72)2 + …(3)(25-33.72)2 = 854.9 This captures the variation across our blocks (GPA levels).

Calculating SS (total) SS =   (Xij - )2 SS = (41-33.72)2 + (36-33.72)2 + … + (27-33.72)2 = 1015.61 We know that SS = SST + SSB + SSE So SSE = SS – SST – SSB

Calculating SSE SSE = 1015.61 – 108.4 – 854.9 = 52.2 Now we can compare our results across the two designs we have discussed: • Completely randomized design • Randomized block design

Comparison of the Designs Sum of Completely Randomized Randomized Block squaresDesign (One way ANOVA)Design (Two way) SST 193 108.4 SSB ---- 854.9 SSE 819.5 52.2 SS 1012.5 1015.61 We can see that we have dramatically decreased the error (SSE) by accounting for GPA. In other words we have decreased the variability caused by the difference among the blocks.

Summary Table Source of df Sum of Mean Variationsquaressquares Treatment k-1 SST MST=SST/(k-1) Block b-1 SSB MSB=SSB/(b-1) Error n-k-b+1 SSE MSE=SSE/(n-k-b+1) Total n-1 SS=SST+SSB+SSE We can calculate a F-statistic to test differences among samples or blocks.

Calculating F (differences among samples) for two way ANOVA F = MST = SST/(k-1) MSE SSE/(n-k-b+1) F = 108.4/(3-1) 52.2/(18-3-6+1) F = 54.2/5.2 = 10.4 Critical F, k-1, n-k-b+1 = F.05, 2, 10 = 4.1

Decision Because our calculated F (10.4) exceeds our critical F (4.1), we reject the null hypothesis that the means across the samples are equal. We conclude that there is a difference in the mean salary levels across the 3 business majors.

Testing Block Differences We could also test whether the blocks are different from each other, or whether students with higher GPA’s earn more money. F = MSB = SSB/(b-1) MSE SSE/(n-k-b+1) F = 854.9/(6-1) 52.2/(18-3-6+1) F = 170.982/5.2 = 32.76

Testing Block Differences Critical F, b-1, n-k-b+1 = F.05, 5, 10 = 3.33 We can also reject the null hypothesis of no difference among blocks because our calculated F (32.76) exceeds our critical F (3.33).

Mean Square Error (MSE) It is interesting to note that MSE is similar to the pooled variance sp2 which we calculated earlier for a matched samples confidence interval. MSE = (n1 – 1)s12 + (n2 – 1)s22 +..+(nk – 1)sk2 (n – k) Thus MSE is an unbiased estimate of 2. W&W show that you can substitute MSE for s in the calculation of a confidence interval (1 - 2).

Some Assumptions for ANOVA • The population random variables must be normally distributed (there are many alternative nonparametric tests if this is violated). • Population variances must be equal. • We assume an additive model, where the effects of the two factors are added together (multiplicative may be needed if students with a particular GPA have an unusually higher salary). • We have no missing data.

Analysis of Variance (ANOVA)