700 likes | 987 Views
Chapter 15. Analysis of Variance. Analysis of Variance…. Analysis of variance is a technique that allows us to compare two or more populations of interval data. Analysis of variance is: an extremely powerful and widely used procedure.
E N D
Chapter 15 Analysis of Variance
Analysis of Variance… • Analysis of variance is a technique that allows us to compare two or more populations of interval data. • Analysis of variance is: • an extremely powerful and widely used procedure. • a procedure which determines whether differences exist between population means. • a procedure which works by analyzing sample variance.
One-Way Analysis of Variance… • Independent samples are drawn from k populations: • Note: These populations are referred to as treatments. • It is not a requirement that n1 = n2 = … = nk.
One Way Analysis of Variance… • New Terminology: • x is the response variable, and its values are responses. • xij refers to the ith observation in the jth sample. • E.g. x35 is the third observation of the fifth sample. • The grand mean, , is the mean of all the observations, i.e.: • (n = n1 + n2 + … + nk)
One Way Analysis of Variance… • More New Terminology: • The unit that we measure is the experimental unit. • Population classification criterion is called a factor. • Each population is a factor level.
Example 15-1… • An apple juice company has a new product featuring… • more convenience, • similar or better quality, and • lower price • when compared with existing juice products. • Which factor should an advertising campaign focus on? • Before going national, test markets are set-up in three cities, each with its own campaign, and data is recorded… • Do differences is sales exist between the test markets?
Example 15.1… Terminology • x is the response variable, and its values are responses. • weekly sales is the response variable; • the actual sales figures are the responses in this example. • xij refers to the ith observation in the jth sample. • E.g. x42 is the fourth week’s sales in city #2: 717 pkgs. • x20, 3 is the last week of sales for city #3: 532 pkgs. comma added for clarity
Example 15.1… Terminology • The unit that we measure is the experimental unit. • Weeks in the three cities when we recorded sales. • Population classification criterion is called a factor. • The advertising strategy is the factor we’re interested in. This is the only factor under consideration (hence the term “one way” analysis of variance). • Each population is a factor level. • In this example, there are three factor levels: convenience, quality, and price.
Example 15.1… IDENTIFY • The null hypothesis in this case is: • H0: • i.e. there are no differences between population means. • Our alternative hypothesis becomes: • H1: at least two means differ • OK. Now we need some test statistics…
Test Statistics… • Since is of interest to us, a statistic that measures the proximity of the sample means to each other would also be of interest. • Such a statistic exists, and is called the between-treatments variation. It is denoted SST, short for “sum of squares for treatments”. Its is calculated as: grand mean sum across k treatments A large SST indicates large variation between sample means which supports H1.
Test Statistics… • SST gave us the between-treatments variation. A second statistic, SSE (Sum of Squares for Error) measures the within-treatments variation. • SSE is given by: or: • In the second formulation, it is easier to see that it provides a measure of the amount of variation we can expect from the random variable we’ve observed.
Example 15.1… COMPUTE • Since: • If it were the case that: • then SST = 0 and our null hypothesis, H0: • would be supported. • More generally, a “small value” of SST supports the null hypothesis. The question is, how small is “small enough”?
Example 15.1… COMPUTE • The following sample statistics and grand mean were computed… • Hence, the between-treatments variation, sum of squares for treatments, is: • Is SST = 57,512.23 “large enough” to indicate the population means differ?
Example 15.1… COMPUTE • We calculate the sample variances as: • and from these, calculate the within-treatments variation (sum of squares for error) as: • We still need a couple more quantities in order to relate SST and SSE together in a meaningful way…
Mean Squares… • The mean square for treatments (MST) is given by: • The mean square for errors (MSE) is given by: • And the test statistic: • is F-distributed with k–1 and n–k degrees of freedom. • Aha! We must be close…
Example 15.1… COMPUTE • We can calculate the mean squares treatment and mean squares error quantities as: • Giving us our F-statistic of: • Does F = 3.23 fall into a rejection region or not? How does it compare to a critical value of F? Note these required conditions: 1. The populations tested are normally distributed. 2. The variances of all the populations tested are equal.
Example 15.1… INTERPRET • Since the purpose of calculating the F-statistic is to determine whether the value of SST is large enough to reject the null hypothesis, if SST is large, F will be large. • Hence our rejection region is: • Our value for FCritical is:
Example 15.1… INTERPRET • Since F = 3.23 is greater than FCritical = 3.15, we reject the null hypothesis (H0: ) in favor of the alternative hypothesis (H1: at least two population means differ). • That is: there is enough evidence to infer that the mean weekly sales differ between the three cities. • Stated another way: we are quite confident that the strategy used to advertise the product will produce different sales figures.
ANOVA Table… • The results of analysis of variance are usually reported in an ANOVA table… F-stat=MST/MSE
Example 15.1… COMPUTE • Using Excel: • Tools, Data Analysis…, Anova: Single Factor • We produce the following output… compare “SST” “SSE” p-value vs. 0.05…
Identifying Factors… • Factors that Identify the One-Way Analysis of Variance:
Analysis of Variance Experimental Designs • Experimental design is one of the factors that determines which technique we use. • In the previous example we compared three populations on the basis of one factor – advertising strategy. • One-way analysis of variance is only one of many different experimental designs of the analysis of variance.
Analysis of Variance Experimental Designs • A multifactor experiment is one where there are two or more factors that define the treatments. • For example, if instead of just varying the advertising strategy for our new apple juice product we also varied the advertising medium (e.g. television or newspaper), then we have a two-factor analysis of variance situation. • The first factor, advertising strategy, still has three levels (convenience, quality, and price) while the second factor, advertising medium, has two levels (TV or print).
Independent Samples and Blocks • Similar to the ‘matched pairs experiment’, a randomized block design experiment reduces the variation within the samples, making it easier to detect differences between populations. • The term block refers to a matched group of observations from each population. • We can also perform a blocked experiment by using the same subject for each treatment in a “repeated measures” experiment.
Independent Samples and Blocks • The randomized block experiment is also called the two-way analysis of variance, not to be confused with the two-factor analysis of variance. To illustrate where we’re headed… we’ll do this first
Randomized Block Analysis of Variance • The purpose of designing a randomized block experiment is to reduce the within-treatments variation to more easily detect differences between the treatment means. • In this design, we partition the total variation into three sources of variation: • SS(Total) = SST + SSB + SSE • where SSB, the sum of squares for blocks, measures the variation between the blocks.
Randomized Blocks… • In addition to k treatments, we introduce notation for b blocks in our experimental design… mean of the observations of the 1st treatment mean of the observations of the 2nd treatment
Sum of Squares : Randomized Block… • Squaring the ‘distance’ from the grand mean, leads to the following set of formulae… test statistic for treatments test statistic for blocks
ANOVA Table… • We can summarize this new information in an analysis of variance (ANOVA) table for the randomized block analysis of variance as follows…
Example 15.2… IDENTIFY • Are there difference in the effectiveness of four new cholesterol drugs? 25 groups of men were matched according to age & weight, and the results were recorded. • The hypotheses to test in this case are: • H0: • H1: At least two means differ
Example 15.2… IDENTIFY • Each of the four drugs can be considered a treatment. • Each group) can be blocked, because they are matched by age and weight. • By setting up the experiment this way, we eliminates the variability in cholesterol reduction related to different combinations of age and weight. This helps detect differences in the mean cholesterol reduction attributed to the different drugs.
: : : : : Example 15.2… The Data Treatment Block There are b = 25 blocks, and k = 4 treatments in this example.
Example 15.2… COMPUTE • We obtain the output from Excel… • Tools > Data Analysis > Anova: Two Factor Without Replication a.k.a. Randomized Block b-1 k-1 Rows Blocks MSB MST Columns Treatments
Example 15.2… INTERPRET • The F-statistic to determine whether differences exist between the four drugs (treatments; columns) is 4.12 and is greater than FCritical=1.67. Its p-value is .0094. • Thus we reject H0 in favor of the research hypothesis: at least two means differ (i.e. there are differences between the treatments).
Example 15.2… INTERPRET • The other F-statistic, 10.11 (p-value = 0; also greater than FCritical=4.12) indicates that there are differences between the groups of men (blocks; rows), that is: age & weight have an impact, but our experiment design accounts for that.
Identifying Factors… • Factors that Identify the Randomized Block of the Analysis of Variance:
Two-Factor Analysis of Variance… • The original set-up for Example 15.1 examined one factor, namely the effects of the marketing strategy on sales. • Emphasis on convenience, • Emphasis on quality, or • Emphasis on price. • Suppose we introduce a second factor, that being the effects of the selected media on sales, that is: • Advertise on television, or • Advertise in newspapers. • To which factor(s) or the interaction of factors can we attribute any differences in mean sales of apple juice?
More Terminology… • A complete factorial experiment is an experiment in which the data for all possible combinations of the levels of the factors are gathered. This is also known as a two-way classification. • The two factors are usually labeled A & B, with the number of levels of each factor denoted by a & b respectively. • The number of observations for each combination is called a replicate, and is denoted by r. For our purposes, the number of replicates will be the same for each treatment, that is they are balanced.
Example 15.3… The Data Factor “A” • Strategy Factor “B” Medium There are a = 3 levels of factor A, b = 2 levels of factor B, yielding 3 x 2 = 6 replicates, each replicate has r = 10 observations…
Levels1and 2 of factor B Mean response Levels of factor A 1 2 3 Possible Outcomes… Fig. 15.5 • This figure illustrates the case where there are differences between levels of A, but no difference between the levels of B and no interaction between A & B:
Level1of factor B Mean response Level 2 of factor B Levels of factor A 1 2 3 Possible Outcomes… Fig. 15.6 • This figure illustrates the case where there are differences between levels of B, but no differences between the levels of A and no interaction between A & B:
Level1of factor B Mean response Level 2 of factor B Levels of factor A 1 2 3 Possible Outcomes… Fig. 15.4 • This figure illustrates the case where there are differences between levels of A, and there are differences between the levels of B, but and no interaction between A & B: • (i.e. the factors affect sales independently, which means there is no interaction)
Level1of factor B Mean response Level 2 of factor B Levels of factor A 1 2 3 Possible Outcomes… Fig. 15.7 • This figure shows the levels of A & B interacting:
ANOVA Table… Table 15.8
Two Factor ANOVA… • Test for the differences between the Levels of Factor A… • H0: The means of the a levels of Factor A are equal • H1: At least two means differ • Test statistic: F = MS(A) / MSE • Example 15.3: Are there differences in the mean sales • caused by different marketing strategies? • H0: • H1: At least two means differ
Two Factor ANOVA… • Test for the differences between the Levels of Factor B… • H0: The means of the a levels of Factor B are equal • H1: At least two means differ • Test statistic: F = MS(B) / MSE • Example 15.3: Are there differences in the mean sales • caused by different advertising media? • H0: • H1: At least two means differ
Two Factor ANOVA… • Test for interaction between Factors A and B… • H0: Factors A and B do not interact to affect the mean responses. • H1: Factors A and B do interact to affect the mean responses. • Test statistic: F = MS(AB) / MSE • Example 15.3: Are there differences in the mean sales caused by interaction between marketing strategy and advertising medium?? • H0: • H1: At least two means differ
Example 15.3… COMPUTE • Using the data, we use Tools > Data Analysis… > • Anova: Two-Factor With Replication and get: Factor B - Media Factor A - Mktg Strategy Interaction of A&B Error