100 likes | 178 Views
Lab 4 Intercept, Variable Centered & Interaction Henian Chen, M.D., Ph.D. %. N. Mean. Range. LIFESAT. 751. 74. 10 ~ 100. Age. 751. 22. 17 ~ 28. Sex. 751. Female=0. 50.2. 377. Male=1. 49.8. 374. Income Low=0. 751.
E N D
Lab 4 Intercept, Variable Centered & Interaction Henian Chen, M.D., Ph.D. Applied Epidemiologic Analysis - P8400 Fall 2002
% N Mean Range LIFESAT 751 74 10 ~ 100 Age 751 22 17 ~ 28 Sex 751 Female=0 50.2 377 Male=1 49.8 374 Income Low=0 751 57.0 428 High=1 43.0 323 Description of the Life Satisfaction Data Applied Epidemiologic Analysis - P8400 Fall 2002
Intercept Y intercept is the estimated Y when all Xi=0 Interaction The circumstance in which the impact of one variable on Y is conditional on (varies across) the values of another predictor. Applied Epidemiologic Analysis - P8400 Fall 2002
Centering Subtracting the sample mean on a variable X from each subject’s score on X. x = X - Mx Center X if X doesn’t have a meaningful zero. With centered variable x, the mean is zero. Thus the regression of Y on x at x=0 (intercept) becomes meaningful. Y = α + β1 age Y = α + β1 age + β2 X + β3 age*X β1: regression of Y on age at X=0 β2: regression of Y on X at age=0 Applied Epidemiologic Analysis - P8400 Fall 2002
SAS Program proc import datafile='a:life-satisfaction751.txt' out=lifesat dbms=tab replace; getnames=yes; run; data lifesat1; set lifesat; agec=age-22; age17=age-17; age28=age-28; proc reg data=lifesat1; model lifesat= ; /* model 1 */ model lifesat=sex; /* model 2 */ model lifesat=income; /* model 3 */ model lifesat=age; /* model 4 */ model lifesat=age17; /* model 5 */ model lifesat=age28; /* model 6 */ model lifesat=agec; /* model 7 */ model lifesat=sex income /* model 8 */ model lifesat=sex income sex_inco; /* model 9 with interaction */ run; Applied Epidemiologic Analysis - P8400 Fall 2002
Model 2 procregdata=lifesat; model lifesat=sex; run; Dependent Variable: LIFESAT Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 74.98674 0.77556 96.69 <.0001 SEX 1 -1.94529 1.09901 -1.77 0.0771 74.98674 is the average life satisfaction score for Females. –1.94529 is the difference on life satisfaction score between male and female. Male is 1.94529 less than Female. Male’s average score=74.98674 – 1.94529 = 73.04145. Applied Epidemiologic Analysis - P8400 Fall 2002
Model 4 procregdata=lifesat; model lifesat=age; run; Dependent Variable: LIFESAT Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 80.63050 4.49283 17.95 <.0001 AGE 1 -0.29990 0.20223 -1.48 0.1385 80.63050 is the average life satisfaction score for subjects at age=0. This intercept does not make sense for us because we do not have a age=0 in our data. –0.29990 means, on the average, each additional year from 17 to 28 is associated with a decrease in life satisfaction score of 0.29990. Applied Epidemiologic Analysis - P8400 Fall 2002
Model 5 procregdata=lifesat; model lifesat=age17; /* age17 = age - 17 */ run; Dependent Variable: LIFESAT Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 75.53220 1.15972 65.13 <.0001 AGE17 1 -0.29990 0.20223 -1.48 0.1385 75.53220 is the average life satisfaction score for subjects at age=17. The regression coefficient is the same (-0.29990). Only the regression intercept has changed. Applied Epidemiologic Analysis - P8400 Fall 2002
Two-way Interaction Lifesat =α + β1 sex + β2 income + β3 sex*income For female (0) and low Income (0): Life Satisfaction score = α + β1*0 + β2*0 + β3*0*0 = α For female (0) and high Income (1): Life Satisfaction score = α + β1*0 + β2*1 + β3*0*1 = α + β2 Male (1) and low Income (0): Life Satisfaction score = α + β1*1 + β2*0 + β3*1*0 = α + β1 Male (1) and high Income (1): Life Satisfaction score = α + β1*1 + β2*1 + β3*1*1 = α + β1 + β2 + β3 Applied Epidemiologic Analysis - P8400 Fall 2002
Three-way Interaction Three independent variables: A, B, C Y = α + β1A + β2B + β3C + β4AB + β5AC + β6BC + β7ABC All lower order terms must be included in the regression model for the β7 coefficient to represent the effect of the three-way interaction on Y. To test d-way interaction, the model must be included: all main effect variables all two-way interaction all three-way interaction all (d-1)-way interaction even though some of them are not significant Applied Epidemiologic Analysis - P8400 Fall 2002