390 likes | 562 Views
Topic 25: Inference for Two-Way ANOVA. Outline. Two-way ANOVA Data, models, parameter estimates ANOVA table, EMS Analytical strategies Regression approach. Data. Response written Y ijk where i denotes the level of the factor A j denotes the level of the factor B
E N D
Outline • Two-way ANOVA • Data, models, parameter estimates • ANOVA table, EMS • Analytical strategies • Regression approach
Data • Response written Yijk where • i denotes the level of the factor A • j denotes the level of the factor B • k denotes the kth observation in cell (i,j) • i = 1, . . . , a levels of factor A • j = 1, . . . , b levels of factor B • k = 1, . . . , n observations in cell (i,j)
Cell means model • Yijk = μij + εijk • where μij is the theoretical mean or expected value of all observations in cell (i,j) • the εijk are iid N(0, σ2) • This means Yijk ~N(μij, σ2) and independent
Factor effects model • μij = μ + αi + βj + (αβ)ij • Consider μ to be the overall mean • αi is the main effect of A • βj is the main effect of B • (αβ)ij is the interaction between A and B
Constraints for this interpretation • α. = Σiαi = 0 (df = a-1) • β. = Σjβj = 0 (df = b-1) • (αβ).j = Σi (αβ)ij = 0 for all j • (αβ)i. = Σj (αβ)ij= 0 for all I df = (a-1)(b-1)
SAS GLM Constraints • αa = 0 (1 constraint) • βb = 0 (1 constraint) • (αβ)aj = 0 for all j (b constraints) • (αβ)ib = 0 for all i (a constraints) • The total is 1+1+a+b-1=a+b+1 (the constraint (αβ)abis counted twice in the last two bullets above)
Parameters and constraints • The cell means model has ab parameters for the means • The factor effects model has (1+a+b+ab) parameters • An intercept (1) • Main effect of A (a) • Main effect of B (b) • Interaction of A and B (ab)
Factor effects model • There are 1+a+b+ab parameters • There are 1+a+b constraints • There are ab unconstrained parameters (or sets of parameters), the same number of parameters for the means in the cell means model • While certain parameters depend on choice of constraints, others do not
KNNL Example • KNNL p 833 • Y is the number of cases of bread sold • A is the height of the shelf display, a=3 levels: bottom, middle, top • B is the width of the shelf display, b=2: regular, wide • n=2 stores for each of the 3x2 treatment combinations
Proc GLM with solution proc glm data=a1; class height width; model sales=height width height*width /solution; means height*width; run;
Solution output Intercept 44.0 B height 1 -1.0 B height 2 25.0 B height 3 0.0 B width 1 -4.0 B width 2 0.0 B
Solution output height*width 1 1 6.0 B height*width 1 2 0.0 B height*width 2 1 0.0 B height*width 2 2 0.0 B height*width 3 1 0.0 B height*width 3 2 0.0 B
Means Based on estimates from previous two pages height width Mean 1 1 45=44 -1-4+6 1 2 43=44 -1+0+0 2 1 65=44+25-4+0 2 2 69=44+25+0+0 3 1 40=44 +0-4+0 3 2 44=44 +0+0+0
Check normalityAlternative way to form QQplot proc glm data=a1; class height width; model sales=height width height*width; output out=a2 r=resid; proc rank data=a2 out=a3 normal=blom; var resid; ranks zresid;
Normal Quantile plot proc sort data=a3; by zresid; symbol1 v=circle i=sm70; proc gplot data=a3; plot resid*zresid/frame; run;
The plot Note, dfE is only 6
ANOVA Table Source df SS MS F A a-1 SSA MSA MSA/MSE B b-1 SSB MSB MSB/MSE AB (a-1)(b-1) SSAB MSAB MSAB/MSE Error ab(n-1) SSE MSE _ Total abn-1 SSTO
Expected Mean Squares • E(MSE) = σ2 • E(MSA) = σ2 + nb(Σiαi2)/(a-1) • E(MSB) = σ2 + na(Σjβj2)/(b-1) • E(MSAB) = σ2 + n(Σ )/((a-1)(b-1)) • Here, αi, βj, and (αβ)ij are defined with the usual factor effects constraints
An analytical strategy • Run the model with main effects and the two-way interaction • Plot the data, the means, and look at the normal quantile plot and residual plots • If assumptions seem reasonable, check the significance of test for the interaction
AB interaction not sig • If the AB interaction is not statistically significant • Possibly rerun the analysis without the interaction (See pooling §19.10) • Potential Type II errors when pooling • For a main effect with more than two levels that is significant, use the means statement with the Tukey multiple comparison procedure
GLM Output Source DF SS MS F Pr > F Model 5 1580 316 30.58 0.0003 Error 6 62 10 Total 11 1642 Note that there are 6 cells in this design.
Output ANOVA Type I or Type III Source DF SS MS F Pr > F height 2 1544 772 74.71 <.0001 width 1 12 12 1.16 0.3226 h*w 2 24 12 1.16 0.3747 Note Type I and Type III analyses are the same because cell size n is constant
Rerun without interaction proc glm data=a1; class height width; model sales=height width; means height / tukey lines; run;
ANOVA output Source DF MS F Pr > F height 2 772 71.81 <.0001 width 1 12 1.12 0.3216 MS(height) and MS(width) have not changed. The MSE, F*’s, and P-values have because of pooling.
Comparison of MSEs Model with interaction Error 6 62 10.33 Model without interaction Error 8 86 10.75 Little change in MSE here…often only pool when df small
Pooling SS • Data = Model + Residual • When we remove a term from the `model’, we put this variation and the associated df into `residual’ • This is called pooling • A benefit is that we have more df for error and a simpler model • Potential Type II errors • Beneficial only in small experiments
Pooling SSE and SSAB • For model with interaction • SSAB=24, dfAB=2 • SSE=62, dfE=6 • MSE=10.33 • For the model with main effects only • SSE=62+24=86, dfE=6+2=8 • MSE=10.75
Tukey Output Mean N height A 67.000 4 2 B 44.000 4 1 B B 42.000 4 3
Regression Approach • Similar to what we did for one-way • Use a-1 variables for A • Use b-1 variables for B • Multiply each of the a-1 variables for A times each of the b-1 for B to get (a-1)(b-1) for AB • You can use the test statement in Proc reg to perform F tests
Create Variables data a4; set a1; X1 = (height eq 1) - (height eq 3); X2 = (height eq 2) - (height eq 3); X3 = (width eq 1) - (width eq 2); X13 = X1*X3; X23 = X2*X3;
Run Proc Reg proc reg data=a4; model sales= X1 X2 X3 X13 X23 / ss1; height: test X1, X2; width: test X3; interaction: test X13, X23; run;
SAS Output Same basic ANOVA table
SS Results • SS(Height) = SS(X1)+SS(X2|X1) 1544 = 8.0 + 1536 • SS(Width) = SS(X3|X1,X2) 12 = 12 • SS(Height*Width) = SS(X13|X1,X2,X3) + SS(X23|X1, X2,X3,X13) 24 = 18 + 6
Last slide • Finish reading KNNL Chapter 19 • Topic25.sas contains the SAS commands for these slides • We will now focus more on the strategies needed to handle a two- or more factor ANOVA