1 / 25

Interactions

Interactions. Interaction : Does the relationship between two variables depend on a third variable? Does the relationship of age to BP depend on gender Does a certain BP-lowering drug work as well in blacks than in non-blacks

kevork
Download Presentation

Interactions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Interactions • Interaction: Does the relationship between two variables depend on a third variable? • Does the relationship of age to BP depend on gender • Does a certain BP-lowering drug work as well in blacks than in non-blacks • Does the relationship between education and income differ by region of the country Sometimes called “effect modification”

  2. Model for FEV Example Y = b0 + b1X1 + b2X2 X1 = smoking status (1=smoker, 0=nonsmoker) X2 = age Smokers FEV = b0 + b1 + b2age Non Smokers FEV = b0 + b2age FEV (smokers) – FEV (non-smokers) = b1 Assumes the slope of age is same for smokers and non-smokers

  3. Non-smokers FEV Smokers b1 b2 b1 b2 AGE

  4. Modeling Interaction for FEV Example Y = b0 + b1X1 + b2X2 + b3X3 X1 = smoking status (1=smoker, 0=nonsmoker) X2 = age X3 = age x smoking status Smokers: FEV = Non Smokers: FEV = FEV (Smokers) – FEV (Non-smokers) = Ho: b3 = 0 b0 + b1 + (b2 + b3) age b0 + b2 age b1 + b3age

  5. Note: Difference in slopes implies smoker/nonsmoker difference depends on age (and vice versa) Non-smokers FEV b1 + b3age smokers b2 b2 + b3 AGE

  6. DATA fev; INFILE DATALINES; INPUT age smk fev; agesmk = age*smk; DATALINES; 28 1 4.0 30 1 3.9 30 1 3.7 31 1 3.6

  7. PROCREG; MODEL fev = age; PLOT fev*age; WHERE smk=0; TITLE'Non-smokers'; RUN; PROCREG; MODEL fev = age; PLOT fev*age; WHERE smk=1; TITLE'Smokers'; RUN;

  8. SMOKERS Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 5.50002 0.36163 15.21 <.0001 age 1 -0.05508 0.00885 -6.22 <.0001 NON SMOKERS Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 5.24764 0.38050 13.79 <.0001 age 1 -0.03911 0.00887 -4.41 0.0007 B1 for smokers = -0.05508 B1 for non-smk = -0.03911 Are these statistically significant?

  9. PROCREG; MODEL fev = age smk agesmk; RUN; Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 5.24764 0.37846 13.87 <.0001 age 1 -0.03911 0.00882 -4.43 0.0002 smk 1 0.25238 0.52482 0.48 0.6346 agesmk 1 -0.01597 0.01253 -1.27 0.2138 Interpretation: B(agesmk) = -0.01597 is difference in slopes between smk/nonsmk B(age) = -0.03911 is slope for non-smokers (smk=0) SMOKERS Intercept 1 5.50002 0.36163 15.21 <.0001 age 1 -0.05508 0.00885 -6.22 <.0001 NON-SMOKERS Intercept 1 5.24764 0.38050 13.79 <.0001 age 1 -0.03911 0.00887 -4.41 0.0007

  10. Polynomial Regression: Adding Quadratic Term Y = bo + b1X+ b2X2 • Can be used if linear relationship does not hold • Example: alcohol intake and mortality • Example: cholesterol and mortality • Add a quadratic (squared) term • Can test hypothesis that quadratic term in needed • Ho: b2 = 0 • Ha: b2≠ 0

  11. Linear Regression Does not Fit Well

  12. Adding Quadratic Term Plot mvo2kg*ffbw predicted.*ffbw/overlay

  13. PROCREGDATA = physfit ; MODEL mvo2kg = ffbw; Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 22211 22211 3.33 0.0724 Error 69 460225 6669.93228 Corrected Total 70 482436 Root MSE 81.66965 R-Square 0.0460 Dependent Mean 455.26761 Adj R-Sq 0.0322 Coeff Var 17.93882 Variable DF Estimate SE t Value Pr > |t| Intercept 1 382.51711 41.02856 9.32 <.0001 ffbw 1 0.17710 0.09705 1.82 0.0724

  14. PROCREGDATA = physfit ; MODEL mvo2kg = ffbw; MODEL mvo2kg = ffbw ffbw2; Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 113179 56589 10.42 0.0001 Error 68 369257 5430.25411 Corrected Total 70 482436 Root MSE 73.69026 R-Square 0.2346 Dependent Mean 455.26761 Adj R-Sq 0.2121 Coeff Var 16.18614 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 980.95393 150.82611 6.50 <.0001 ffbw 1 -2.68220 0.70406 -3.81 0.0003 ffbw2 1 0.00322 0.00078761 4.09 0.0001 ffbw2 = ffbw * ffbw Computed in datastep

  15. Model Selection • Measure many predictors; how do you decide which to include in your model? • Depends on reason for fitting model • Prediction? Examine specific effects? • Statistical criteria do exist, should not be used in place of scientific criteria • Best used in exploratory context

  16. Statistical principles to use • Forward, backward, and stepwise selection • Compare p-values of terms; add/remove based on  = 0.05 or 0.10 • R2 methods • Look for models with highest R2 • Other methods exist

  17. Possible Uses for Using Statistical Criteria • Outcome: Measure of Teenage Drinking • Many Possible Predictors • Questionnaire on relationships, friends, family, church support etc. • Outcome: Echocardographic determined hypertrophy of the heart • Many Possible ECG predictors • Computer measurements from ECG

  18. Backward selection procedure Removes worst variable, then second worst, etc PROCREGDATA = physfit; MODEL mvo2kg = male age hgt wgt ffbw rhr / selection=backward; RUN; Final model: Parameter Standard Variable Estimate Error Type II SS F Value Pr > F Intercept 574.86126 56.50900 167151 103.49 <.0001 male 88.90825 12.02381 88312 54.68 <.0001 age -6.85862 3.80692 5242.56660 3.25 0.0762 wgt -6.00865 1.02203 55827 34.56 <.0001 ffbw 0.75073 0.12729 56184 34.79 <.0001 rhr -0.79442 0.41916 5801.82822 3.59 0.0625

  19. Forward selection procedure Start with best single variable, adds next best, etc PROCREGDATA = physfit; MODEL mvo2kg = male age hgt wgt ffbw rhr / selection=forward; RUN; This example - ends up including all terms except height • Exactly same model as one picked by backward selection

  20. “MAXR” method Select several models based on maximal R2 PROCREGDATA = physfit; MODEL mvo2kg = male age hgt wgt ffbw rhr / selection=maxr; RUN; • Will give “best” models with 1, 2, 3... Terms • You choose best overall among the “best”

  21. Final models by MAXR method

  22. Two general principles to use • Parsimony - less is more • Common sense • Don’t use social security number to predict height! • Cautionary Note • Models with several variables are not as good at predicting as model might suggest.

More Related