E N D
Chapter 15 Analysis of Variance
The article “Could Mean Platelet Volume be a Predictive Marker for Acute Myocardial Infarction?” (Medical Science Monitor, 2005) described an experiment in which four groups of patients seeking treatment for chest pain were compared with respect of mean platelet volume (MPV). The purpose of the study was to determine is mean MPV was different for the four groups, in particular for the heart attack group. If so, then MPV could be used as an indicator of heart attack risk. When two or more populations or treatments are being compared, the characteristic that distinguishes the populations or treatments from one another is called the factor. In order to compare the means, the researchers must use a procedure called a single factor analysis of variance or ANOVA. In this experiment, the factor is the clinical diagnosis. The four groups were (1) noncardiac chest pain, (2) stable angina pectoris, (3) unstable angina pectoris, (4) myocardial infarction (heart attack). Researchers need to compare the means from the four treatment groups to determine if m1 = m2 = m3 = m4 or if at least one of the means differ from the rest.
Mean of Sample 1 Mean of Sample 2 Mean of Sample 3 Graph A Graph B Mean of Sample 1 Mean of Sample 2 Mean of Sample 3 Whether the null hypothesis (of equal means) should be rejected depends on how substantially the samples from the different populations or treatments differ from one another. Consider the following example. In Group A, notice that the three samples seem to have very different means and very little variability in each sample. This would lead us to doubt the claim that m1 = m2 = m3. In Group B, notice that the three samples have the same means as Group A. However, due to the large amount of variability in each sample and the fact that the samples overlap, it is plausible that the samples could come from populations with equal means. The phrase “analysis of variance” comes from the idea of analyzing variability in the data to see how much can be attributed to differences in m’s and how much is due to variability in the individual populations.
ANOVA Notation k = the number of populations or treatments being compared The total number of observations Grand Total Grand Mean
ANOVA Notation Continued . . . A measure of differences among the sample means is the treatment sum of squares, denoted by SSTr and given by A measure of variation within the k samples, called error sum of squares and denoted SSE, is Each sum of squares has an associated df: treatment df = k – 1 error df = N – k A mean square is a sum of squares divided by its df. The number of error degrees of freedom comes from adding the number of degrees of freedom associated with each of the sample variances: (n1 – 1) + (n2 – 1) + … (nk – 1) = N - k
The Single Factor ANOVA F test Null hypothesis: H0: m1 = m2 = … = mk Alternative hypothesis: Ha: at least two m’s are different Test Statistic: with df1 = k – 1 and df2 = N – k P-value: the area under the appropriate F curve to the right of the calculated F value When H0 is true, mMSTr = mMSE When H0 is false, mMSTr > mMSE
The Single Factor ANOVA F Test Continued . . . Assumptions: • Each of the k population or treatment response distributions is normal. • The k normal distributions have identical standard deviations. (s1 = s2 = … = sk) • The observations in the sample from any particular one of the k populations or treatments are independent of one another. • When comparing population means, the k random samples are selected independently of each other. When comparing treatment means, treatments are assigned at random to subjects or objects. If sample sizes are large, individual boxplots or normal probability plots for each sample can be used to check for normality. If the sample sizes are small, then a combined normal probability plot should be used to check for normality. First find the deviations from the respective mean in each sample, Then combine the deviations to create the normal probability plot While there is a formal procedure to check for equal standard deviations, its use is not recommended due to its sensitivity to any departure from normality. The ANOVA F test can safely be used if the largest sample standard deviation is not more than twice the smallest sample standard deviation.
Heart Attack Risk Continued . . . Here are the summary statistics for the four groups: H0: m1 = m2 = m3 = m4 Ha: at least two m’s are different The subjects were randomly selected from groups of individuals who had been diagnosed with the four conditions. State the hypotheses. Verify assumptions. To verify the equality of the standard deviations, notice that the largest sample deviation (group 4) is less than twice that of the smallest standard deviation (group 1). The four boxplots are approximately symmetrical with no outliers, so the assumption of normality is plausible.
Heart Attack Risk Continued . . . Here is the summary statistics for the four groups: Calculate the sum of squares terms. Calculate the F test statistic.
Heart Attack Risk Continued . . . H0: m1 = m2 = m3 = m4 Ha: at least two m’s are different Test Statistic: with df1 = 3 and df2 = 136 P-value < .001 a = .05 Since the P-value < a, we reject H0. There is convincing evidence to conclude that mean MPV is not the same for all four patient populations.
Summarizing an ANOVA ANOVA calculations are often summarized in a tabular format called an ANOVA table. To understand such a table, we need one more sum of squares term. Total sum of squares, denoted by SSTo, is given by with df = N – 1. The relationship between the three sum of squares is: SSTo = SSTr + SSE This is the fundamental identity for single-factor ANOVA.
The General Format for a Single-Factor ANOVA Table When the analysis is done by statistical software, then the P-value appears here.
Heart Attack Risk Continued . . . This is the ANOVAtable for this data set. Now we know that at least two of the means are different – but which two? To answer the question in this study we need to know if the mean MPV for the heart attack group is the mean that is different.
Tukey-Kramer (T-K) Multiple Comparison Procedure What do we do now that we know that at least two of the population or treatment means are different? • This procedure is based on calculating confidence intervals for the difference between each possible pair of m’s. • If the interval contains the value zero, then there is no significant difference between the means involved. • If, however, the interval does NOT contain the value zero, then the two means are significantly different. We need to use a multiple comparison procedure, which is a method of identifying differences between m’s. How can we tell which of the mean(s) is/are different?
If the sample sizes are the same, we can use Tukey-Kramer (T-K) Multiple Comparison Procedure When there are k populations or treatments being compare, the number of confidence intervals necessary is given by For mi– mj: where q is the relevant Studentized range critical value The two means are judged to differ significantly if the interval does not contain 0. T-K intervals are based on probability distributions called studentized range distributions.
Heart Attack Risk Revisited . . . Number of confidence intervals to compute: For m1 – m2: This interval contains 0, so there is not a significant difference in the mean MVP between patients with noncardiac chest pain and patients with stable angina. Sample sizes are the same in each treatment. This is the critical value for 95% confidence when k = 4 and df = 120 (closest df in the table to 136). How many confidence intervals will we need to compute?
Heart Attack Risk Revisited . . . The only interval that does not contain 0 is for the difference in mean MPV between patients with noncardiac chest pain and patients with heart attacks. The remaining confidence intervals are calculated in the same manner. They are . . .
List the sample means in increasing order, identifying population just above each x Population 3 2 1 4 5 Sample Mean x3x2x1x4x5 Population 3 2 1 4 5 Sample Mean x3x2x1x4x5 Summarizing the Results of the Tukey-Kramer Procedure If the sample means for populations 3, 2, and 1 are not significantly different, then draw a line under them. 2. Use the T-K intervals to determine the group of means that do not differ significantly from the first in the list. Draw a horizontal line extending from the smallest mean to the last mean in the group identified,
Population 3 2 1 4 5 Sample Mean x3x2x1x4x5 Summarizing the Results of the Tukey-Kramer Procedure If the sample means for population 2 is not significantly different from 1 and 4, but is different from 5, then draw a line under 2, 1, and 4. 3. Use the T-K intervals to determine the group of means that are not significantlydifferent from the second smallest in the list. If this entire group of means is not underscored, draw a horizontal line extending from the smallest mean to the last mean in the new group, 4. Continue considering the means in the order listed, adding new lines as needed.
Heart Attack Risk Revisited . . . Based on these data, we have evidence that the mean MPV is not the same for the noncardiac chest pain group and the heart attack group. But since the difference in means is small compared to the variability among the individuals in each group, it would still be difficult to distinguish the two groups based on an individual MPV value. And we don’t have evidence that the mean is different for the heart attack group and the two angina groups. So, MPV is probably not useful as a predictor of heart attack. Should mean MPV be used as a predictor of heart attacks? Let’s summarize these T-K intervals.