E N D
1. Analysis of Variance Chapter 12
2. Introduction Analysis of variance compares two or more populations of interval data.
Specifically, we are interested in determining whether differences exist between the population means.
The procedure works by analyzing the sample variance.
3. The analysis of variance is a procedure that tests to determine whether differences exits between two or more population means.
To do this, the technique analyzes the sample variances 12.1 One Way Analysis of Variance
4. One Way Analysis of Variance: Example A magazine publisher wants to compare three different styles of covers for a magazine that will be offered for sale at supermarket checkout lines. She assigns 60 stores at random to the three styles of covers and records the number of magazines that are sold in a one-week period.
5. One Way Analysis of Variance: Example How do five bookstores in the same city differ in the demographics of their customers? A market researcher asks 50 customers of each store to respond to a questionnaire. One variable of interest is the customer’s age.
6. Idea Behind ANOVA
8. Idea behind ANOVA: recall the two-sample t-statistic Difference between 2 means, pooled variances, sample sizes both equal to n
Numerator of t2: measures variation between the groups in terms of the difference between their sample means
Denominator: measures variation within groups by the pooled estimator of the common variance.
If the within-group variation is small, the same variation between groups produces a larger statistic and a more significant result.
9. Example 12.1
An apple juice manufacturer is planning to develop a new product -a liquid concentrate.
The marketing manager has to decide how to market the new product.
Three strategies are considered
Emphasize convenience of using the product.
Emphasize the quality of the product.
Emphasize the product’s low price. One Way Analysis of Variance: Example
10. Example 12.1 - continued
An experiment was conducted as follows:
In three cities an advertisement campaign was launched .
In each city only one of the three characteristics (convenience, quality, and price) was emphasized.
The weekly sales were recorded for twenty weeks following the beginning of the campaigns. One Way Analysis of Variance
11. One Way Analysis of Variance
12. Solution
The data are interval
The problem objective is to compare sales in three cities.
We hypothesize that the three population means are equal One Way Analysis of Variance
13. Defining the Hypotheses
14. Notation
15. Terminology In the context of this problem…
Response variable – weekly salesResponses – actual sale valuesExperimental unit – weeks in the three cities when we record sales figures.Factor – the criterion by which we classify the populations (the treatments). In this problems the factor is the marketing strategy.
Factor levels – the population (treatment) names. In this problem factor levels are the 3 marketing strategies: 1) convenience, 2) quality, 3) price
16. Two types of variability are employed when testing for the equality of the population means
17. The rationale behind the test statistic – I If the null hypothesis is true, we would expect all the sample means to be close to one another (and as a result, close to the grand mean).
If the alternative hypothesis is true, at least some of the sample means would differ.
Thus, we measure variability between sample means.
18. Variability between sample means
19. Sum of squares for treatment groups (SSG)
20. Solution – continuedCalculate SSG Sum of squares for treatment groups (SSG)
21. Is SSG = 57,512.23 large enough to reject H0 in favor of H1?See next. Sum of squares for treatment groups (SSG)
22. Large variability within the samples weakens the “ability” of the sample means to represent their corresponding population means.
Therefore, even though sample means may markedly differ from one another, SSG must be judged relative to the “within samples variability”. The rationale behind test statistic – II
23. The variability within samples is measured by adding all the squared distances between observations and their sample means.
This sum is called the
Sum of Squares for Error
SSE Within samples variability
24. Solution – continuedCalculate SSE Sum of squares for errors (SSE)
25. Is SSG = 57,512.23 large enough relative to SSE = 506,983.50 to reject the null hypothesis that specifies that all the means are equal? Sum of squares for errors (SSE)
26. The mean sum of squares
27. Calculation of the test statistic
28. The F test rejection region
29. The F test
30. Use Excel to find the p-value
fx Statistical FDIST(3.23,2,57) = .0467 The F test p- value
31. Excel single factor ANOVA
32. Multiple Comparisons When the null hypothesis is rejected, it may be desirable to find which mean(s) is (are) different, and at what ranking order.
Two statistical inference procedures, geared at doing this, are presented:
“regular” confidence interval calculations
Bonferroni adjustment
33. Two means are considered different if the confidence interval for the difference between the corresponding sample means does not contain 0. In this case the larger sample mean is believed to be associated with a larger population mean.
How do we calculate the confidence intervals? Multiple Comparisons
34. “Regular” Method This method builds on the equal variances confidence interval for the difference between two means.
The CI is improved by using MSE rather than sp2 (we use ALL the data to estimate the common variance instead of only the data from 2 samples)
35. Experiment-wise Type I error rate(the effective Type I error) The preceding “regular” method may result in an increased probability of committing a type I error.
The experiment-wise Type I error rate is the probability of committing at least one Type I error at significance level a. It is calculated by:
experiment-wise Type I error rate = 1-(1 – a)gwhere g is the number of pairwise comparisons (i.e. g = k C 2 = k(k-1)/2.
For example, if a=.05, k=4, then
experiment-wise Type I error rate =1-.735=.265
The Bonferroni adjustment determines the required Type I error probability per pairwise comparison (a*) , to secure a pre-determined overall a.
36. The procedure:
Compute the number of pairwise comparisons (g)[g=k(k-1)/2], where k is the number of populations.
Set a* = a/g, where a is the true probability of making at least one Type I error (called experiment-wise Type I error).
Calculate the following CI for mi – mj
Bonferroni Adjustment
37. Example - continued
Rank the effectiveness of the marketing strategies(based on mean weekly sales).
Use the Bonferroni adjustment method
Solution
The sample mean sales were 577.55, 653.0, 608.65.
We calculate g=k(k-1)/2 to be 3(2)/2 = 3.
We set a* = .05/3 = .0167, thus t.0167/2, 60-3 = 2.467 (Excel).
Note that s = v8894.447 = 94.31 Bonferroni Method
38. Bonferroni Method: The Three Confidence Intervals
39. Bonferroni Method: Conclusions Resulting from Confidence Intervals Do we have evidence to distinguish two means?
Group 1 Convenience: sample mean 577.55
Group 2 Quality: sample mean 653
Group 3 Price: sample mean 608.65
List the group numbers in increasing order of their sample means; connecting overhead lines mean no significant difference