E N D
Chapter 15 The Analysis of Variance
A Problem • A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or breast when treated with ascorbate1. In this study, the authors wanted to determine if the survival times differ based on the affected organ. 1 Cameron, E. and Pauling, L. (1978) Supplemental ascorbate in the supportive treatment of cancer: re-evaluation of prolongation of survival time in terminal human cancer. Proceedings of the National Academy of Science, USA, 75, 4538-4542.
A Problem • A comparative dotplot of the survival times is shown below.
A Problem • The hypotheses used to answer the question of interest are • H0: mstomach = mbronchus = mcolon = movary = mbreast • Ha: At least two of the m’s are different • The question is similar to ones encountered in chapter 11 where we looked at tests for the difference of means of two different variables. In this case we are interested in looking a more than two variable.
Single-factor Analysis of Variance (ANOVA) • A single-factor analysis of variance (ANOVA) problems involves a comparison of k population or treatment means m1, m2, … , mk. The objective is to test the hypotheses: • H0: m1 = m2 = m3 = …= mk • Ha: At least two of the m’s are different
Single-factor Analysis of Variance (ANOVA) • The analysis is based on k independently selected samples, one from each population or for each treatment. • In the case of populations, a random sample from each population is selected independently of that from any other population. • When comparing treatments, the experimental units (subjects or objects) that receive any particular treatment are chosen at random from those available for the experiment. • A comparison of treatments based on independently selected experimental units is often referred to as a completely randomized design.
Single-Factor Analysis of Variance (ANOVA) Notice that in the comparative dotplot on the left, the differences in the treatment means is large relative to the variability within the samples while with the comparative dotplot on the right, the differences in the sample means is relative to the sample variability is not so clear cut. ANOVA techniques will allow us to determined if those differences are significant.
Assumptions for ANOVA Each of the k populations or treatments, the response distribution is normal. s1 = s2 = … = sk (The k normal distributions have identical standard deviations. The observations in the sample from any particular one of the k populations or treatments are independent of one another. When comparing population means, k random samples are selected independently of one another. When comparing treatment means, treatments are assigned at random to subjects or objects.
Definitions The error df comes from adding the df’s associated with each of the sample variances: (n1 - 1) + (n2 - 1) + …+ (nk - 1) = n1 + n2 … + nk - 1 - 1 - … - 1 = N - k
Example Three filling machines are used by a bottler to fill 12 oz cans of soda. In an attempt to determine if the three machines are filling the cans to the same (mean) level, independent samples of cans filled by each were selected and the amounts of soda in the cans measured. The samples are given below. Machine 1 12.033 11.985 12.009 12.009 12.033 12.025 12.054 12.050 Machine 2 12.031 11.985 11.998 11.992 11.985 12.027 11.987 Machine 3 12.034 12.021 12.038 12.058 12.001 12.020 12.029 12.011 12.021
Comments • Both MSTr and MSE are quantities that are calculated from sample data. As such, both MSTr and MSE are statistics and have sampling distributions. • More specifically, when H0 is true (m1 = m2 = m3 = …= mk), mMSTr = mMSE. • However, when H0 is false, mMSTr = mMSE and the greater the differences among the m’s, the larger mMSTr will be relative to mMSE.
The Single-Factor ANOVA F Test When H0 is true and the ANOVA assumptions are reasonable, F has an F distribution with df1 = k - 1 and df2 = N - k. Values of F more contradictory to H0 than what was calculated are values even farther out in the upper tail, so the P-value is the area captured in the upper tail of the corresponding F curve.
Example Consider the earlier example involving the three filling machines. Machine 1 12.033 11.985 12.009 12.009 12.033 12.025 12.054 12.050 Machine 2 12.031 11.985 11.998 11.992 11.985 12.027 11.987 Machine 3 12.034 12.021 12.038 12.058 12.001 12.020 12.029 12.011 12.021
Example Looking at the comparative dotplot, it seems reasonable to assume that the distributions have the same s’s. We shall look at the normality assumption on the next slide. * *When the sample sizes are large, we can make judgments about both the equality of the standard deviations and the normality of the underlying populations with a comparative boxplot.
Example (continued) Looking at normal plots for the samples, it certainly appears reasonable to assume that the samples from Machine’s 1 and 2 are samples from normal distributions. Unfortunately, the normal plot for the sample from Machine 2 does not appear to be a sample from a normal population. So as to have a computational example, we shall continue and finish the test, treating the result with a “grain of salt.”
Example • Conclusion: Since P-value > a = 0.01, we fail to reject H0. We are unable to show that the mean fills are different and conclude that the differences in the mean fills of the machines show no statistically significant differences.
Total Sum of Squares The relationship between the three sums of squares is SSTo = SSTr + SSE which is often called the fundamental identity for single-factor ANOVA. Informally this relation is expressed as Total variation = Explained variation + Unexplained variation
Single-factor ANOVA Table The following is a fairly standard way of presenting the important calculations from an single-factor ANOVA. The output from most statistical packages will contain an additional column giving the P-value.
Single-factor ANOVA Table The ANOVA table supplied by Minitab One-way ANOVA: Fills versus Machine Analysis of Variance for Fills Source DF SS MS F P Machine 2 0.003016 0.001508 3.84 0.038 Error 21 0.008256 0.000393 Total 23 0.011271
Another Example A food company that sells iced tea, produces 4 different flavored sweetened iced teas (lemon, raspberry, peach, green tea). A dietician working for the company needed to determine if the current formulations gave the same mean sodium levels for the four flavors. In order to determine if the four flavors had the same sodium levels, 15 bottles of each flavor were randomly (and independently) obtained and the sodium content in milligrams (mg) per 12 ounce serving was measured. The sample data are given on the next slide. Use the data to perform an appropriate hypothesis test at the 0.05 level of significance.
Another Example Flavor 1 35.0 35.6 34.1 39.6 35.6 32.3 36.6 34.5 35.2 33.8 36.7 37.2 34.0 33.8 35.8 Flavor 2 37.3 37.4 38.3 34.9 39.0 36.5 36.9 37.6 34.9 40.4 37.5 33.5 38.2 34.6 34.5 Flavor 3 35.2 33.4 34.5 38.1 36.2 35.4 38.5 31.5 36.7 35.6 36.7 39.3 36.8 31.5 33.2 Flavor 4 35.4 35.7 31.4 34.5 34.1 31.2 37.5 37.3 31.7 33.2 33.8 35.8
Another Example Looking at the following comparative boxplot, it seems reasonable to assume that the distributions have the same s’s as well as the samples being samples from normal distributions (i.e., It is reasonable to assume that the distributions of sodium content per 12 ounce serving are normal for each of the four flavors.
4.27 Another Example • P-value: F = 4.27 with dfnumerator = 3 and dfdenominator = 56 Using df = 60 (the closest entry to 56 in the table) we find 0.001 < P-value < 0.01
Another Example • Conclusion: Since P-value < a = 0.05, we reject H0. We can conclude that the mean sodium content is different for at least two of the flavors. We need to learn how to interpret the results and will spend some time on developing techniques to describe the differences among the m’s.
Multiple Comparisons • A multiple comparison procedure is a method for identifying differences among the m’s once the hypothesis of overall equality (H0) has been rejected. • The technique we will present is based on computing confidence intervals for difference of means for the pairs. • Specifically, if k populations or treatments are studied, we would create k(k-1)/2 differences. (i.e., with 3 treatments one would generate confidence intervals for m1 - m2, m1 - m3 and m2 - m3.) Notice that it is only necessary to look at a confidence interval for m1 - m2 to see if m1 and m2 differ.
Example (continued) Notice that the confidence interval for m2 - m4 does not contain 0 so we can infer that the mean sodium content for flavors 2 and 4 differ.
Flavor 4 Flavor 1 Flavor 3 Flavor 2 33.893 35.320 35.507 36.767 Example (continued) Notice that the confidence interval for m2 - m4 does not contain 0 so we can infer that the mean sodium content for flavors 2 and 4 differ. We also illustrate the differences with the following listing of the sample means in increasing order with lines underneath those blocks of means that are indistinguishable. Notice that the confidence interval for m2 - m4 does not contain 0 so we can infer that the mean sodium content for flavors 2 and 4 differ.
Minitab Output for Example One-way ANOVA: Sodium versus Flavor Analysis of Variance for Sodium Source DF SS MS F P Flavor 3 62.29 20.76 4.55 0.006 Error 56 255.74 4.57 Total 59 318.02 Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev --+---------+---------+---------+---- Flavor 1 15 35.320 1.764 (------*-------) Flavor 2 15 36.767 1.929 (------*------) Flavor 3 15 35.507 2.361 (-------*------) Flavor 4 15 33.893 2.421 (------*------) --+---------+---------+---------+---- Pooled StDev = 2.137 33.0 34.5 36.0 37.5
Minitab Output for Example Tukey's pairwise comparisons Family error rate = 0.0500 Individual error rate = 0.0106 Critical value = 3.74 Intervals for (column level mean) - (row level mean) Flavor 1 Flavor 2 Flavor 3 Flavor 2 -3.510 0.617 Flavor 3 -2.250 -0.804 1.877 3.324 Flavor 4 -0.637 0.810 -0.450 3.490 4.937 3.677
Simultaneous Confidence Level The Tukey-Kramer intervals are created in a manner that controls the simultaneous confidence level. For example at the 95% level, if the procedure is used repeatedly on many different data sets, in the long run only about 5% of the time would at least one of the intervals not include that value of what it is estimating. We then talk about the family error rate being 5% which is the maximum probability of one or more of the confidence intervals of the differences of mean not containing the true difference of mean.
Randomized Block Experiment • Suppose that experimental units (individuals or objects to which the treatments are applied) are first separated into groups consisting of k units in such a way that the units within each group are as similar as possible. Within any particular group, the treatments are then randomly allocated so that each unit in a group receives a different treatment. The groups are often called blocks and the experimental design is referred to as a randomized block design.
Example • When choosing a variety of melon to plant, one thing that a farmer might be interested in is the length of time (in days) for the variety to bear harvestable fruit. Since the growing conditions (soil, temperature, humidity) also affect this, a farmer might experiment with three hybrid melons (denoted hybrid A, hybrid B and hybrid C) by taking each of the four fields that he wants to use for growing melons and subdividing each field into 3 subplots (1, 2 and 3) and then planting each hybrid in one subplot of each field. The blocks are the fields and the treatments are the hybrid that is planted. The question of interest would be “Are the mean times to bring harvestable fruit the same for all three hybrids?”
Assumptions and Hypotheses • The single observation made on any particular treatment in a given block is assumed to be selected from a normal distribution. The variance of this distribution is s2, the same for each block-treatment combinations. However, the mean value may depend separately both on the treatment applied and on the block. The hypotheses of interest are as follows: H0: The mean value does not depend on which treatment is applied Ha: The mean value does depend on which treatment is applied
Summary of the Randomized Block F Test Sum of squares and associated df’s are as follows.