340 likes | 452 Views
Be humble in our attribute, be loving and varying in our attitude, that is the way to live in heaven. Applied Statistics Using SAS and SPSS. Topic: One Way ANOVA By Prof Kelly Fan, Cal State Univ, East Bay. Statistical Tools vs. Variable Types. Example: Battery Lifetime.
E N D
Be humble in our attribute, be loving and varying in our attitude, that is the way to live in heaven.
Applied Statistics Using SAS and SPSS Topic: One Way ANOVA By Prof Kelly Fan, Cal State Univ, East Bay
Example: Battery Lifetime • 8 brands of battery are studied. We would like to find out whether or not the brand of a battery will affect its lifetime. If so, of which brand the batteries can last longer than the other brands. • Data collection: For each brand, 3 batteries are tested for their lifetime. • What is Y variable? X variable?
1 2 3 4 5 6 7 8 1.8 4.2 8.6 7.0 4.2 4.2 7.8 9.0 5.0 5.4 4.6 5.0 7.8 4.2 7.0 7.4 1.0 4.2 4.2 9.0 6.6 5.4 9.8 5.8 5.8 2.6 4.6 5.8 7.0 6.2 4.6 8.2 7.4 Data: Y = LIFETIME (HOURS) BRAND 3 replications per level
Statistical Model (Brand is, of course, represented as “categorical”) “LEVEL” OF BRAND 1 2 • • • • • • • • C 1 2 • • • • n Y11 Y12 • • • • • • •Y1c Yij = i + ij i = 1, . . . . . , C j = 1, . . . . . , n Y21 • • • • • • YnI • • • • • Yij Ync • • • • • • • •
Hypotheses Setup HO: Level of X has no impact on Y HI: Level of X does have impact on Y HO: 1 = 2 = • • • • 8 HI: not all j are EQUAL
ONE WAY ANOVA Analysis of Variance for life Source DF SS MS F P brand 7 69.12 9.87 3.38 0.021 Error 16 46.72 2.92 Total 23 115.84 Estimate of the common variances^2 S = 1.709 R-Sq = 59.67% R-Sq(adj) = 42.02%
Review • Fitted value = Predicted value • Residual = Observed value – fitted value
Diagnosis: Normality • The points on the normality plot must more or less follow a line to claim “normal distributed”. • There are statistic tests to verify it scientifically. • The ANOVA method we learn here is not sensitive to the normality assumption. That is, a mild departure from the normal distribution will not change our conclusions much. Normality plot: normal scores vs. residuals
Diagnosis: Equal Variances • The points on the residual plot must be more or less within a horizontal band to claim “constant variances”. • There are statistic tests to verify it scientifically. • The ANOVA method we learn here is not sensitive to the constant variances assumption. That is, slightly different variances within groups will not change our conclusions much. Residual plot: fitted values vs. residuals
Multiple Comparison Procedures Once we reject H0: ==...c in favor of H1: NOT all ’s are equal, we don’t yet know the way in which they’re not all equal, but simply that they’re not all the same. If there are 4 columns, are all 4 ’s different? Are 3 the same and one different? If so, which one? etc.
These “more detailed” inquiries into the process are called MULTIPLE COMPARISON PROCEDURES. Errors (Type I): We set up “” as the significance level for a hypothesis test. Suppose we test 3 independent hypotheses, each at = .05; each test has type I error (rej H0 when it’s true) of .05. However, P(at least one type I error in the 3 tests) = 1-P( accept all ) = 1 - (.95)3 .14 3, given true
In other words, Probability is .14 that at least one type one error is made. For 5 tests, prob = .23. Question - Should we choose = .05, and suffer (for 5 tests) a .23 OVERALL Error rate (or “a” or aexperimentwise)? OR Should we choose/control the overall error rate, “a”, to be .05, and find the individual test by 1 - (1-)5 = .05, (which gives us = .011)?
The formula 1 - (1-)5 = .05 would be valid only if the tests are independent; often they’re not. [ e.g., 1=22=3, 1= 3 IF accepted & rejected, isn’t it more likely that rejected? ] 2 3 1 1 2 3
When the tests are not independent, it’s usually very difficult to arrive at the correct for an individual test so that a specified value results for the overall error rate.
Categories of multiple comparison tests - “Planned”/ “a priori” comparisons (stated in advance, usually a linear combination of the column means equal to zero.) “Post hoc”/ “a posteriori” comparisons (decided after a look at the data - which comparisons “look interesting”) “Post hoc” multiple comparisons (every column mean compared with each other column mean)
There are many multiple comparison procedures. We’ll cover only a few. • Post hoc multiple comparisons • Pairwise comparisons: Do a series of pairwise tests; Duncan and SNK tests • (Optional) Comparisons to control: Dunnett tests
Example: Broker Study A financial firm would like to determine if brokers they use to execute trades differ with respect to their ability to provide a stock purchase for the firm at a low buying price per share. To measure cost, an index, Y, is used. Y=1000(A-P)/A where P=per share price paid for the stock; A=average of high price and low price per share, for the day. “The higher Y is the better the trade is.”
CoL: broker 1 12 3 5 -1 12 5 6 2 7 17 13 11 7 17 12 3 8 1 7 4 3 7 5 4 21 10 15 12 20 6 14 5 24 13 14 18 14 19 17 } R=6 Five brokers were in the study and six trades were randomly assigned to each broker.
SPSS Output Analyze>>General Linear Model>>Univariate…
Conclusion : 3, 1 2 4 5 ??? Conclusion : 3, 1 2, 4, 5
Broker 1 and 3 are not significantly different but they are significantly different to the other 3 brokers. Broker 2 and 4 are not significantly different, and broker 4 and 5 are not significantly different, but broker 2 is different to (smaller than) broker 5 significantly. Conclusion : 3, 1 2 4 5
Comparisons to Control Dunnett’s test Designed specifically for (and incorporating the interdependencies of) comparing several “treatments” to a “control.” Col Example: 1 2 3 4 5 } R=6 6 12 5 14 17 CONTROL
CONTROL 1 2 3 4 5 In our example: 6 12 5 14 17 - Cols 4 and 5 differ from the control [ 1 ]. - Cols 2 and 3 are not significantly different from control.
Exercise: Sales Data Sales
Exercise. • Find the Anova table. • Perform SNK tests at a = 5% to group treatments . • Perform Duncan tests at a = 5% to group treatments. • Which treatment would you use?
Post Hoc and Priori comparisons • F test for linear combination of column means (contrast) • Scheffe test: To test all linear combinations at once. Very conservative; not to be used for a few of comparisons.