1 / 20

Lecture 16: Interactions

Lecture 16: Interactions. March 17, 2014. Question. How relevant do you think regression will be to your career? Not relevant at all Might be relevant but I don’t see it Probably relevant Definitely relevant I have no idea. Administrative. Exam 2: Two weeks from today

fern
Download Presentation

Lecture 16: Interactions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 16:Interactions March 17, 2014

  2. Question How relevant do you think regression will be to your career? • Not relevant at all • Might be relevant but I don’t see it • Probably relevant • Definitely relevant • I have no idea

  3. Administrative • Exam 2: Two weeks from today • Multiple Regression + Categorical Variables + diagnostics • Similar format as Exam 1: open hardcopy notes and book. • Problem Set 6 due tomorrow at noon • Problem Set 7 posted tonight and due next Wednesday. • Wednesday: • Quiz: multiple regression + categorical variables • Posted reading about multiple regression diagnostics • Read them before Wednesday’s class

  4. Last time Categorical variables • Multiple regression with an indicator (dummy)and a variable for years of experience: • What is the intercept? • What does the coefficient on Female mean?

  5. Categorical Variables What if your category has more then 2 possible options? • Split up into multiple dummy variables. • By hand (if functions in Excel) or StatTools When you split a categorical variable into multiple indicator variables there are a couple of things to always remember: • You can NOT include all possible indicator variables in the regression equation. Why? • There will then be perfect collinearity. Therefore you must exclude one group (or dummy variable) • Because you’re excluding one of the possible dummy variables, all of your coefficients will be relative to that group. • Which one to exclude is up to you (unless I tell you) but in practice it’s about interpretation – i.e., choosing the comparison group.

  6. Example • Remember that including a dummy allows the intercept to vary by group: How do we let the slope vary by group?

  7. Varying Slopes by Group:Interactions between explanatory variables Let’s return to the example of estimating managerial earnings by years of experience, gender, and if they are senior (5+ years experience): http://www.qatar.cmu.edu/~gasper/Regression/managers.xls • If we want all the slopes of our explanatory variables to vary by group, for instance Gender, then we could do a subset analysis • Estimate the model twice: • once the subset of data where Male=1 • once the subset of data where Male=0 • Alternatively if we just want one slope to vary, we “interact” the dummy for Male with that explanatory variable (e.g., years of experience) • In Excel create a new variable = Years * Male • Estimate the model including ALL THREE variables: Male, YearsExperience, Years*Male.

  8. Question Estimate the the model with an interaction between Years and Male. What is the estimated difference between salaries of new employees for women and men? • 130.99 • 4.61 • -0.42 • 3.49 • I don’t know

  9. Interactions In Excel you’ll need to create the variable yourself:

  10. Interactions Warning: Estimating a model with interactions is easy…but interpreting the estimated intercept and slopes can get tricky. • Not hard, but caution is urged: Think. • The equation for the group coded as 0 forms the baseline for comparison. • The slope of the dummy variable is the difference between the intercepts (as before). The slope of the interaction is the difference between the estimated slopes

  11. Interpreting Interactions We’re estimating: • The estimated salary for a woman with 7 years experience? • = β0 + β1 * (7) + β2 * (0)+ β3* (7 * 0) • = β0 + β1 * (7) • The estimated salary for a man with 6 years experience? • = (β0 + β2)+ (β1 + β3 )* (6) • The slope of the dummy variable is the difference between the intercepts (as before). The slope of the interaction is the difference between the estimated slopes

  12. Varying slope / varying intercept • Plotting both:

  13. Question What is the VIF of the Group X Years interaction term in the previous model? • 5.33 • 13.07 • 1.64 • 5.68 • I have no idea what’s going on.

  14. Collinearity with Interactions • Adding an interaction, by definition, adds collinearity to the model. • The high VIF of 13.07 for the interaction term is an indication. • Remember, collinearity isn’t always bad. Know that it’s there • Should we keep the interaction term in the model? • I’d say don’t keep it. • Since the interaction term isn’t significant and we don’t have a really strong theoretical reason to include it, remove it from the model.

  15. Interactions • When should you include interactions? • …when you think you should. • If you’re exploring the data, interacting main effects is often a good idea. • In general: • If the interaction is statistically significant, retain it in the model as well as both components regardless of their level of significance. • If the interaction is not significant, remove it and re-estimate. Interaction terms can be enlightening but also more complicated to interpret and all else equal, simpler is better. • You aren’t limited to interacting 2 terms. • You can interact 3, 4, … k different terms. But it gets very confusing very quickly. • We won’t do it in this class.

  16. Common Mistakes Some common errors: • Omitting variables that are part of the interaction • If you want just the intercept to vary, include just the Dummy • If you want both the intercept and slope to vary by group, include both constituent terms and the interaction term • If you want (oddly) to slope to vary but force the intercept to be the same, include only one constituent term and the interaction • Failing to note the conditional nature of the coefficients. • Partial slopes of the constituent terms are conditional on the other constituent term being 0

  17. Interacting Continuous Variables • So far we’ve only interacted a Dummy predictor with a continuous predictor. • You can certainly interact two continuous (or two dummy) predictors. • I don’t believe this is covered in the text but can be quite useful • It’s no harder; it’s exactly the same. • Conceptually: the effect of one independent variable on the dependent variable changes gradually as you change another independent variable • However…it’s easier to make mistakes when interpreting results

  18. Interacting Continuous Variables House Price example: • How do I interpret the partial slopes? • Same as before, but β1and β2 are a little funky… • β1= the change in price by increasing Size by 1, when BRooms = 0 • β2= the change in price by increasing BRooms by 1, when Size = 0

  19. Interacting Continuous Variables House Price example: • Another way to think about it: What’s the slope on Size? = β1+ β3 * BRooms Slope on BRooms? = β2+ β3 * Size

  20. Standardizing Variables • Given the slightly funky nature of interpreting constituent terms of an interaction, it’s often a good idea to re-center or standardize the variables: • Mean centering: construct a new variable that subtracts off the mean of that variable (observation – mean). • Centers the variable around 0 • Now the interpretation of the partial slope of the constituent term is the effect for the average level of that predictor • Z-score standardizing: mean center and divide by the standard deviation. • Why? Now the partial slope is the effect of increasing the variable by 1 standard deviation rather than by 1 unit. Makes comparisons between predictors easier.

More Related