260 likes | 456 Views
Introduction to the Analysis of Variance (ANOVA). Single-Factor Independent-Measures Design. A diagram of a single-factor, independent-measures research design is shown in the figure. Notice that a separate sample is taken for each of the three treatment conditions.
E N D
Single-Factor Independent-Measures Design • A diagram of a single-factor, independent-measures research design is shown in the figure. • Notice that a separate sample is taken for each of the three treatment conditions. • Also notice that the three samples have different scores and different means.
The goal of ANOVA is to help the researcher decide between the following two interpretations: • 1. There really are no differences between the populations (or treatments). The observed differences between samples are simply due to chance (sampling error). • 2. The differences between the sample means represent real differences between the populations. That is, the populations (or treatments) really do have different means, and the sample data accurately reflect these differences. • These two interpretations correspond to the two hypotheses (null hypothesis and alternative hypothesis) that are part of the general hypothesis testing procedure.
An Introductory Example… • Suppose that a business consultant is examining employee performance under three temperature conditions: 50O, 70O, and 90O. • Three samples of subjects are selected, one sample for each treatment condition. • We want to decide between two hypotheses: • the null hypothesis (HO), which says that temperature has no effect, and • the alternative hypothesis (H1), which states that temperature does affect productivity.
In symbols, the null hypothesis states • HO: :1 = :2 = :3 • For the alternative hypothesis, we may state that • H1: At least one population mean is different from the others. • Other alternatives might be • H1: :1…:2…:3 • H1: :1 = :3 , but :2 is different
The Test Statistic for the ANOVA • For the t statistic, we computed a ratio with the following structure: • t = obtained difference between sample means difference expected by chance (error) • For analysis of variance, the test statistic is called an F-ratio and has the following structure: • F = variance (differences) between sample means variance (differences) expected by chance (error)
The reason for this change is that ANOVA is used in situations where there are more than two sample means and it is impossible to compute a sample mean difference. • The solution to this problem is to use variance to define and measure the size of the differences among the sample means.
Consider the following two sets of sample means: • SET 1 SET 2 • 20 28 • 30 30 • 35 31 • If you compute the variance for the three numbers in each set, then the variance S2 you obtain for set 1 is 58.33 and the variance for set 2 isS2 = 2.33.
The Logic of the Analysis of Variance • The first step is to determine the total variability for the entire set of data. • To compute the total variability, we will combine all the scores from all the separate samples to obtain one general measure of variability for the complete experiment. • Once we have measured the total variability, we can begin to break it apart into separate components.
Between-Treatments Variance. Much of the variability in the scores is due to general differences between treatment conditions.
Within-Treatment Variance. In addition to the general differences between treatment conditions, there is variability within each sample.
Heart of the ANOVA • So, analyzing the total variability into these two components (between- & within-treatments) is the heart of analysis of variance!
So… The Purpose of the ANOVA is… • In addition to measuring the differences between treatments, the overall goal of ANOVA is to evaluate the differences between treatments. • Specifically, the purpose of the analysis is to distinguish between two alternative explanations: • 1. The differences between treatments have been caused by treatment effects. • 2. The differences between treatments are simply due to chance.
Thus, there are always two possible explanations for the difference (or variance) that exists between treatments: • Treatment Effect. The differences are caused by the treatments. For the data in the table, the scores in sample 1 were obtained in a 50O room and the scores in sample 2 were obtained in a 70O room. It is possible that the difference between samples is caused by the different temperatures. • Chance. The differences are simply due to chance. If there is no treatment effect at all, you would still expect some differences between samples.
Between-Treatments Variability • Treatment Effect. It is possible that the different treatments have caused the samples to be different. • Individual Differences. Subjects enter the study with different backgrounds, abilities, and attitudes; that is, they’re unique. • Experimental Error. Whenever you make a measurement, there’s a chance of error.
Within-Treatments Variability • Individual Differences. The scores are obtained from different individuals, which could explain why they are variable. • Experimental Error. There always is a chance that the differences are caused by experimental error.
The F-Ratio: The Test Statistic for ANOVA • F = variance between treatments variance within treatments • F = treatment effect + differences due to chance differences due to chance • F = treatment effect + Ind. Diff. + Exp. Error Ind. Diff. + Exp. Error
If HO is true, there is no treatment effect. In this case, the numerator and denominator of the F-ratio are measuring the same variance: • F = 0 + Ind. Diff. + Exp. Error Ind. Diff. + Exp. Error
If HO is false, then a treatment effect does exist and the F-ratio becomes: • F = treatment effect + Ind. Diff. + Exp. Error Ind. Diff. + Exp. Error
An Example… • A real estate developer is considering investing in a shopping mall on the outskirts of Atlanta, Georgia. Three parcels of land are being evaluated. Of particular interest is the income in the area surrounding the proposed mall. On the next slide (and handout) is a set of sample results. Do the three areas have generally the same incomes or do they differ?