370 likes | 512 Views
Chapter 6 - Interactions of Categorical Predictors. Contents. 6.0 Introduction 6.1. Analysis with two categorical variables 6.2. Simple effects 6.3. Simple comparisons 6.4. Interactions 6.5. More Interaction contrasts 6.6. Computing adjusted means. 6.0 Introduction.
E N D
Contents 6.0 Introduction 6.1. Analysis with two categorical variables 6.2. Simple effects 6.3. Simple comparisons 6.4. Interactions 6.5. More Interaction contrasts 6.6. Computing adjusted means
6.0 Introduction This chapter will use the elemmath2 data that you have seen in the prior chapters. Libname lib “c:\SASREG”; data datamath2; set lib. datamath2'; run; Outcome: math00: (performance on the math in the year 2000 . Factors: mealcat, the categorical of the percentage of families with free meal. Collcat, the categories of percentage of parent education with college education or above. Categories: low, middle and high for both factors. Aim: How these two categorical variables are related to math performance in the school, and we will look at the interaction of these two categorical variables as well.
SAS program libname lib 'c:\sasreg'; data datamath2; set lib.datamath2; ods html ; proctabulate data=datamath2; class collcat mealcat ; var math00; table mealcat='mealcat', mean=' '*math00='MATH Index for 2000'*collcat='collcat'*F=10.2 / RTS=13.; run; ods off; run;
6.1. Analysis with two categorical variables One traditional way to analyze this would be to perform a 3 by 3 factorial analysis of variance using proc glm, as shown below. The results show a main effect of collcat (F=4.5, p-0.0117), a main effect of mealcat (F=509.04, p=0.0000) and an interaction of collcat by mealcat, (F=6.63, p=0.0000). We also use lsmeans and output statement to output the predicted means for each group and get ourselve ready to graph the cell means. proc glm data = datamath2; class collcat mealcat; model math00 = collcat | mealcat /ss3; lsmeans collcat*mealcat; output out = pred p = pred; run; quit;
SAS Output The GLM Procedure Class Level Information Class Levels Values collcat 3 1 2 3 mealcat 3 1 2 3 Number of observations 400 The GLM Procedure Dependent Variable: math00 math 2000 Sum of Source DF Squares Mean Square F Value Pr > F Model 8 6243714.810 780464.351 166.76 <.0001 Error 391 1829957.187 4680.197 Corrected Total 399 8073671.998 R-Square Coeff Var Root MSE math00 Mean 0.773343 10.56356 68.41197 647.6225 Source DF Type III SS Mean Square F Value Pr > F collcat 2 42140.566 21070.283 4.50 0.0117 mealcat 2 4764843.563 2382421.781 509.04 <.0001 collcat*mealcat 4 124167.809 31041.952 6.63 <.0001
Least Squares Means collcat mealcat math00 LSMEAN 1 1 816.914286 1 2 589.350000 1 3 493.918919 2 1 825.651163 2 2 636.604651 2 3 508.833333 3 1 782.150943 3 2 655.637681 3 3 541.733333
proc sort data = pred; by mealcat; run; symbol1 v=circle i=join ci=blue h= 2; symbol2 v=triangle i=join ci=red h =2; symbol3 v=square i=join ci=black h =2; proc gplot data = pred; plot pred*mealcat=collcat ; run; quit;
Using Proc Reg (1) Coding for both collcat and mealcat. Create interaction terms for them. The first test statement tests the effect of main effect of collcat, the second the main effect of mealcat and the last one on the effect of overall interaction. data reg1; set datamath2; s2 = -1/3; s3=-1/3; if collcat = 2 then s2 = 2/3; else if collcat = 3 then s3 = 2/3; m2 = -1/3; m3 = -1/3; if mealcat = 2 then m2 = 2/3; else if mealcat = 3 then m3 = 2/3; sm22 = s2*m2; sm23 = s2*m3; sm32 = s3*m2; sm33 = s3*m3; run; proc reg data = reg1; model math00 = s2 s3 m2 m3 sm22 sm23 sm32 sm33; Collcat: test s2=s3=0; Mealcat: test m2=m3=0; Interaction: test sm22=sm23=sm32=sm33=0; output out = pred2 p = pred; run; quit;
SAS OUTPUT Dependent Variable: math00 math 2000 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 8 6243715 780464 166.76 <.0001 Error 391 1829957 4680.19741 Corrected Total 399 8073672 Root MSE 68.41197 R-Square 0.7733 Dependent Mean 647.62250 Adj R-Sq 0.7687 Coeff Var 10.56356 Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > |t| Intercept Intercept 1 650.08826 3.87189 167.90 <.0001 s2 1 23.63531 9.10533 2.60 0.0098 s3 1 26.44625 9.99513 2.65 0.0085 m2 1 -181.04135 9.07713 -19.94 <.0001 m3 1 -293.41027 9.44946 -31.05 <.0001 sm22 1 38.51777 24.19532 1.59 0.1122 sm23 1 6.17754 20.08262 0.31 0.7585 sm32 1 101.05102 22.88808 4.42 <.0001 sm33 1 82.57776 24.43941 3.38 0.0008
Test Collcat Results for Dependent Variable MATH00 Mean Source DF Square F Value Pr > F Numerator 2 21070 4.50 0.0117 Denominator 391 4680.19741 Test Mealcat Results for Dependent Variable MATH00 Mean Source DF Square F Value Pr > F Numerator 2 2382422 509.04 <.0001 Denominator 391 4680.19741 Test Interaction Results for Dependent Variable MATH00 Mean Source DF Square F Value Pr > F Numerator 4 31042 6.63 <.0001 Denominator 391 4680.19741
Test Collcat Results for Dependent Variable math00 Mean Source DF Square F Value Pr > F Numerator 2 25455 5.44 0.0047 Denominator 391 4680.19741 The REG Procedure Model: MODEL1 Test Mealcat Results for Dependent Variable math00 Mean Source DF Square F Value Pr > F Numerator 2 1240049 264.96 <.0001 Denominator 391 4680.19741 The REG Procedure Model: MODEL1 Test Interaction Results for Dependent Variable math00 Mean Source DF Square F Value Pr > F Numerator 4 31042 6.63 <.0001 Denominator 391 4680.19741
Using Pro Reg (ii) data reg2; set datamath2; s2 = (collcat=2); s3=(collcat=3); m2 = (mealcat=2); m3=(mealcat=3); sm22 = s2*m2; sm23 = s2*m3; sm32 = s3*m2; sm33 = s3*m3; run; procreg data = reg2; model math00 = s2 s3 m2 m3 sm22 sm23 sm32 sm33; Collcat: test s2=s3=0; Mealcat: test m2=m3=0; Interaction: test sm22=sm23=sm32=sm33=0; output out = pred2 p = pred; run; quit;
SAS OUTPUT Dependent Variable: math00 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 8 6243715 780464 166.76 <.0001 Error 391 1829957 4680.19741 Corrected Total 399 8073672 Root MSE 68.41197 R-Square 0.7733 Dependent Mean 647.62250 Adj R-Sq 0.7687 Coeff Var 10.56356 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 816.91429 11.56373 70.64 <.0001 s2 1 8.73688 15.57439 0.56 0.5751 s3 1 -34.76334 14.90052 -2.33 0.0202 m2 1 -227.56429 19.17628 -11.87 <.0001 m3 1 -322.99537 14.03445 -23.01 <.0001 sm22 1 38.51777 24.19532 1.59 0.1122 sm23 1 6.17754 20.08262 0.31 0.7585 sm32 1 101.05102 22.88808 4.42 <.0001 sm33 1 82.57776 24.43941 3.38 0.0008
Observations • What difference between two program • Note the interception, one is the average of the means by 9 sub-groups, another one is for mealcat=1 and collcat=1. • All estimation for interaction are the same • The main effect is different due to the different reference value • There could be different ways to code them.
observations • The graph of the cell means we obtained show the interaction between collcat and mealcat. The graph shows the 3 levels of collcat as 3 different lines, and the 3 levels of mealcat as the 3 values on the x axis of the graph. • In general, if three lines parallel, no interaction. If they cross to each other, there are interactions. • Due to crossing, the effect of collcat differs based on the level of mealcat. • When mealcat is low, schools where collcat is 3 have the lowest math00 scores, as compared to schools that are medium or high on mealcat, where schools with collcat of 3 have the highest math00 scores.
6.2. Simple effects 6.2.1 Analyzing simple effects using PROC GLM This analysis looks at the simple effects of collcat at the different levels of mealcat using proc glm. The lsmeans statement with option slice = mealcat gives the test of effects of collcat at each level of mealcat. proc glm data= datamath2; class collcat mealcat; model math00 = mealcat|collcat ; lsmeans mealcat*collcat / slice = mealcat ; run; quit;
Results by lsmean/slice The GLM Procedure Sum of Source DF Squares Mean Square F Value Pr > F Model 8 6243714.810 780464.351 166.76 <.0001 Error 391 1829957.187 4680.197 Corrected Total 399 8073671.998 R-Square Coeff Var Root MSE MATH00 Mean 0.773343 10.56356 68.41197 647.6225 Source DF Type III SS Mean Square F Value Pr > F MEALCAT 2 4764843.563 2382421.781 509.04 <.0001 COLLCAT 2 42140.566 21070.283 4.50 0.0117 COLLCAT*MEALCAT 4 124167.809 31041.952 6.63 <.0001 COLLCAT MEALCAT MATH00 LSMEAN 1 1 816.914286 1 2 589.350000 1 3 493.918919 2 1 825.651163 2 2 636.604651 2 3 508.833333 3 1 782.150943 3 2 655.637681 3 3 541.733333 COLLCAT*MEALCAT Effect Sliced by MEALCAT for MATH00 Sum of MEALCAT DF Squares Mean Square F Value Pr > F 1 2 50909 25455 5.44 0.0047 2 2 68629 34314 7.33 0.0007 3 2 29979 14990 3.20 0.0417
Observations • Collcat is significant for any levels of mealcat (income related) • Problem, we can’t see the effect direction • We can ran the analyses by mealcat.
6.2.2 Analyzing simple effects using PROC GLM and by Option This analysis looks at the simple effects of collcat by the levels of mealcat using proc glm. Proc sort data=datamath2; by mealcat; proc glm data= datamath2; by mealcat; class collcat ; model math00 =collcat /solution; run; quit;
--------------------------- Percentage free meals in 3 categories=1 --------------------------- Standard Parameter Estimate Error t Value Pr > |t| Intercept 782.1509434 B 8.66790415 90.24 <.0001 collcat 1 34.7633423 B 13.74426164 2.53 0.0126 collcat 2 43.5002194 B 12.95136339 3.36 0.0010 collcat 3 0.0000000 B . . . --------------------------- Percentage free meals in 3 categories=2 --------------------------- Standard Parameter Estimate Error t Value Pr > |t| Intercept 655.6376812 B 9.56955140 68.51 <.0001 collcat 1 -66.2876812 B 20.18699080 -3.28 0.0013 collcat 2 -19.0330300 B 15.44423364 -1.23 0.2201 collcat 3 0.0000000 B . . . --------------------------- Percentage free meals in 3 categories=3 --------------------------- Standard Parameter Estimate Error t Value Pr > |t| Intercept 541.7333333 B 15.85282299 34.17 <.0001 collcat 1 -47.8144144 B 17.38544270 -2.75 0.0068 collcat 2 -32.9000000 B 18.16169033 -1.81 0.0723 collcat 3 0.0000000 B . . .
Collcat as continuous --------------------------- Percentage free meals in 3 categories=1 --------------------------- Standard Parameter Estimate Error t Value Pr > |t| Intercept 846.7258122 15.82436044 53.51 <.0001 collcat -19.1860050 6.92522301 -2.77 0.0064 --------------------------- Percentage free meals in 3 categories=2 --------------------------- Standard Parameter Estimate Error t Value Pr > |t| Intercept 568.3453514 23.42563148 24.26 <.0001 collcat 29.9629828 9.43915244 3.17 0.0019 --------------------------- Percentage free meals in 3 categories=3 --------------------------- Standard Parameter Estimate Error t Value Pr > |t| Intercept 471.4485767 13.14921361 35.85 <.0001 collcat 20.9839302 7.68577874 2.73 0.0072
6.3 Simple Comparisons • We can use contrast and estimate to compare the conditional effect of collcat within different mealcat level. • If the levels within a factor is over two, we may consider if the effect linear, or non-linear.
Comparsion of group 1 vs 2+ of collcat within mealcat given proc glm data = datamath2; class collcat mealcat; model math00 = collcat mealcat collcat*mealcat/ss3; estimate 'collcat 1 vs 2+ overall mealcat' collcat 1 -.5 -.5; estimate 'collcat 1 vs 2+ within mealcat = 1' collcat 1 -.5 -.5 collcat*mealcat 100 -.500 -.500; estimate 'collcat 1 vs 2+ within mealcat = 2' collcat 1 -.5 -.5 collcat*mealcat 010 0 -.50 0 -.50; run; quit;
SAS OUTPUT The GLM Procedure Dependent Variable: math00 Sum of Source DF Squares Mean Square F Value Pr > F Model 8 6243714.810 780464.351 166.76 <.0001 Error 391 1829957.187 4680.197 Corrected Total 399 8073671.998 R-Square Coeff Var Root MSE math00 Mean 0.773343 10.56356 68.41197 647.6225 Source DF Type III SS Mean Square F Value Pr > F collcat 2 42140.566 21070.283 4.50 0.0117 mealcat 2 4764843.563 2382421.781 509.04 <.0001 collcat*mealcat 4 124167.809 31041.952 6.63 <.0001 Standard Parameter Estimate Error t Value Pr > |t| collcat 1 vs 2+ overall mealcat -25.0407827 8.3453884 -3.00 0.0029 collcat 1 vs 2+ within mealcat = 1 13.0132326 13.5279998 0.96 0.3367 collcat 1 vs 2+ within mealcat = 2 -56.7711662 16.6786557 -3.40 0.0007
Observations • Overall, the students of parents with collage level1 had lower math score than other groups with higher educated parents • Among the free meal % low (higher income family), this is not true. Those with higher educated parents had lower math scores than those of their peers. • Among the free meal high, those with higher educated parents had higher math scores than those of their peers. .
6.4 Interaction Interaction is a kind of action that occurs as two or more objects have an effect upon one another. The idea of a two-way effect is essential in the concept of interaction, as opposed to a one-way causal effect. A closely related term is interconnectivity, which deals with the interactions of interactions within systems: combinations of many simple interactions can lead to surprising emergent phenomena. Interaction has different tailored meanings in various sciences. All systems are related and interdependent. Every action has a consequence.
Observation • Overall, there is an interaction • Among white and black, there is an interaction • Among white and Asian, there is no interaction How can we use Proc GLM to study the interaction? Using the example in this chapter, we would like to see if collcat effect on math score is homogenous by mealcat.
Contrast in GLM proc glm data = datamath2; class collcat mealcat; model api00 = collcat mealcat collcat*mealcat; contrast ‘Diff Btw meal1 and meal 2 by Coll1 vs Colle2&3' collcat*mealcat 1 -1 0 -.5 .5 0 -.5 .5 0; contrast ‘Diff Btw meal2 and meal 3 by Coll1 vs Colle2&3' collcat*mealcat 0 1 -1 0 -.5 .5 0 -.5 .5; contrast ‘Diff Btw meal1 and meal 2 and meal 3 by Coll1 vs Colle2&3' collcat*mealcat 1 -1 0 -.5 .5 0 -.5 .5 0, collcat*mealcat 0 1 -1 0 -.5 .5 0 -.5 .5; contrast ‘ Diff among meals 1, 2, an 3 for coll 2 and 3' collcat*mealcat 0 0 0 1 -1 0 -1 1 0, collcat*mealcat 0 0 0 0 1 -1 0 -1 1; contrast ‘ Diff among meals 1, 2, an 3 for coll 2 and 3' collcat*mealcat 1 -0.5 -0.5 -1 0.5 0.5 0 0 0 , collcat*mealcat 1 -0.5 -0.5 0 0 0 -1 0.5 0.5; run; quit;
SAS Output Dependent Variable: math00 Sum of Source DF Squares Mean Square F Value Pr > F Model 8 6243714.810 780464.351 166.76 <.0001 Error 391 1829957.187 4680.197 Corrected Total 399 8073671.998 R-Square Coeff Var Root MSE math00 Mean 0.773343 10.56356 68.41197 647.6225 Source DF Type I SS Mean Square F Value Pr > F collcat 2 612289.838 306144.919 65.41 <.0001 mealcat 2 5507257.164 2753628.582 588.36 <.0001 collcat*mealcat 4 124167.809 31041.952 6.63 <.0001 Source DF Type III SS Mean Square F Value Pr > F collcat 2 42140.566 21070.283 4.50 0.0117 mealcat 2 4764843.563 2382421.781 509.04 <.0001 collcat*mealcat 4 124167.809 31041.952 6.63 <.0001
Contrasts Contrast DF Contrast SS Mean Square Diff Btw meal1 and meal 2 by Coll1 vs Colle2&3 1 49420.4056 49420.4056 Diff Btw meal2 and meal 3 by Coll1 vs Colle2&3 1 6807.2615 6807.2615 Diff Btw meal1 and meal 2 and meal 3 by Coll1 vs Colle2&3 2 54141.4096 27070.7048 Diff among meals 1, 2, an 3 for coll 2 and 3 2 66511.6013 33255.8007 Diff among College Lvevel, for meal 1 vs Meal 2&3 Comb 2 113539.2619 56769.6309 Contrast F Value Pr > F Diff Btw meal1 and meal 2 by Coll1 vs Colle2&3 10.56 0.0013 Diff Btw meal2 and meal 3 by Coll1 vs Colle2&3 1.45 0.2285 Diff Btw meal1 and meal 2 and meal 3 by Coll1 vs Colle2&3 5.78 0.0033 Diff among meals 1, 2, an 3 for coll 2 and 3 7.11 0.0009 Diff among College Lvevel, for meal 1 vs Meal 2&3 Comb 12.13 <.0001
6.5 More Interactions proc glm data = datamath2; class collcat mealcat; model math00 = collcat mealcat collcat*mealcat; contrast 'collcat 2v3 with mealcat 1v2' collcat*mealcat 0 0 0 1 -1 0 -1 1 0; contrast 'somecat 2v3 with mealcat 2v3' collcat*mealcat 0 0 0 0 1 -1 0 -1 1; run; quit;
Contract Test <output omitted> Contrast DF Contrast SS Mean Square F Value collcat 2v3 with mealcat 1v2 1 48958.23687 48958.23687 10.46 somceat 2v3 with mealcat 2v3 1 1535.28987 1535.28987 0.33 Contrast Pr > F collcat 2v3 with mealcat 1v2 0.0013 somceat 2v3 with mealcat 2v3 0.5671
6.6 Computing Adjusted Means proc glm data = datamath2; class collcat mealcat; model math00 = collcat mealcat collcat*mealcat emer /ss3; lsmeans collcat*mealcat; run; quit; Note: least square mean doesn’t consider the sample weight for each categories, which treat the sample is balanced by such categories.
SAS output The GLM Procedure Sum of Source DF Squares Mean Square F Value Pr > F Model 9 6402428.265 711380.918 166.01 <.0001 Error 390 1671243.733 4285.240 Corrected Total 399 8073671.998 R-Square Coeff Var Root MSE math00 Mean 0.793001 10.10801 65.46175 647.6225 Source DF Type III SS Mean Square F Value Pr > F collcat 2 34730.090 17365.045 4.05 0.0181 mealcat 2 3017331.845 1508665.923 352.06 <.0001 collcat*mealcat 4 96789.116 24197.279 5.65 0.0002 emer 1 158713.455 158713.455 37.04 <.0001 collcat mealcat math00 LSMEAN 1 1 797.560428 1 2 596.972811 1 3 509.872241 2 1 812.550248 2 2 636.404940 2 3 523.884659 3 1 767.935241 3 2 652.976146 3 3 550.461628