200 likes | 326 Views
Lecture 16: Interactions. March 17, 2014. Question. How relevant do you think regression will be to your career? Not relevant at all Might be relevant but I don’t see it Probably relevant Definitely relevant I have no idea. Administrative. Exam 2: Two weeks from today
E N D
Lecture 16:Interactions March 17, 2014
Question How relevant do you think regression will be to your career? • Not relevant at all • Might be relevant but I don’t see it • Probably relevant • Definitely relevant • I have no idea
Administrative • Exam 2: Two weeks from today • Multiple Regression + Categorical Variables + diagnostics • Similar format as Exam 1: open hardcopy notes and book. • Problem Set 6 due tomorrow at noon • Problem Set 7 posted tonight and due next Wednesday. • Wednesday: • Quiz: multiple regression + categorical variables • Posted reading about multiple regression diagnostics • Read them before Wednesday’s class
Last time Categorical variables • Multiple regression with an indicator (dummy)and a variable for years of experience: • What is the intercept? • What does the coefficient on Female mean?
Categorical Variables What if your category has more then 2 possible options? • Split up into multiple dummy variables. • By hand (if functions in Excel) or StatTools When you split a categorical variable into multiple indicator variables there are a couple of things to always remember: • You can NOT include all possible indicator variables in the regression equation. Why? • There will then be perfect collinearity. Therefore you must exclude one group (or dummy variable) • Because you’re excluding one of the possible dummy variables, all of your coefficients will be relative to that group. • Which one to exclude is up to you (unless I tell you) but in practice it’s about interpretation – i.e., choosing the comparison group.
Example • Remember that including a dummy allows the intercept to vary by group: How do we let the slope vary by group?
Varying Slopes by Group:Interactions between explanatory variables Let’s return to the example of estimating managerial earnings by years of experience, gender, and if they are senior (5+ years experience): http://www.qatar.cmu.edu/~gasper/Regression/managers.xls • If we want all the slopes of our explanatory variables to vary by group, for instance Gender, then we could do a subset analysis • Estimate the model twice: • once the subset of data where Male=1 • once the subset of data where Male=0 • Alternatively if we just want one slope to vary, we “interact” the dummy for Male with that explanatory variable (e.g., years of experience) • In Excel create a new variable = Years * Male • Estimate the model including ALL THREE variables: Male, YearsExperience, Years*Male.
Question Estimate the the model with an interaction between Years and Male. What is the estimated difference between salaries of new employees for women and men? • 130.99 • 4.61 • -0.42 • 3.49 • I don’t know
Interactions In Excel you’ll need to create the variable yourself:
Interactions Warning: Estimating a model with interactions is easy…but interpreting the estimated intercept and slopes can get tricky. • Not hard, but caution is urged: Think. • The equation for the group coded as 0 forms the baseline for comparison. • The slope of the dummy variable is the difference between the intercepts (as before). The slope of the interaction is the difference between the estimated slopes
Interpreting Interactions We’re estimating: • The estimated salary for a woman with 7 years experience? • = β0 + β1 * (7) + β2 * (0)+ β3* (7 * 0) • = β0 + β1 * (7) • The estimated salary for a man with 6 years experience? • = (β0 + β2)+ (β1 + β3 )* (6) • The slope of the dummy variable is the difference between the intercepts (as before). The slope of the interaction is the difference between the estimated slopes
Varying slope / varying intercept • Plotting both:
Question What is the VIF of the Group X Years interaction term in the previous model? • 5.33 • 13.07 • 1.64 • 5.68 • I have no idea what’s going on.
Collinearity with Interactions • Adding an interaction, by definition, adds collinearity to the model. • The high VIF of 13.07 for the interaction term is an indication. • Remember, collinearity isn’t always bad. Know that it’s there • Should we keep the interaction term in the model? • I’d say don’t keep it. • Since the interaction term isn’t significant and we don’t have a really strong theoretical reason to include it, remove it from the model.
Interactions • When should you include interactions? • …when you think you should. • If you’re exploring the data, interacting main effects is often a good idea. • In general: • If the interaction is statistically significant, retain it in the model as well as both components regardless of their level of significance. • If the interaction is not significant, remove it and re-estimate. Interaction terms can be enlightening but also more complicated to interpret and all else equal, simpler is better. • You aren’t limited to interacting 2 terms. • You can interact 3, 4, … k different terms. But it gets very confusing very quickly. • We won’t do it in this class.
Common Mistakes Some common errors: • Omitting variables that are part of the interaction • If you want just the intercept to vary, include just the Dummy • If you want both the intercept and slope to vary by group, include both constituent terms and the interaction term • If you want (oddly) to slope to vary but force the intercept to be the same, include only one constituent term and the interaction • Failing to note the conditional nature of the coefficients. • Partial slopes of the constituent terms are conditional on the other constituent term being 0
Interacting Continuous Variables • So far we’ve only interacted a Dummy predictor with a continuous predictor. • You can certainly interact two continuous (or two dummy) predictors. • I don’t believe this is covered in the text but can be quite useful • It’s no harder; it’s exactly the same. • Conceptually: the effect of one independent variable on the dependent variable changes gradually as you change another independent variable • However…it’s easier to make mistakes when interpreting results
Interacting Continuous Variables House Price example: • How do I interpret the partial slopes? • Same as before, but β1and β2 are a little funky… • β1= the change in price by increasing Size by 1, when BRooms = 0 • β2= the change in price by increasing BRooms by 1, when Size = 0
Interacting Continuous Variables House Price example: • Another way to think about it: What’s the slope on Size? = β1+ β3 * BRooms Slope on BRooms? = β2+ β3 * Size
Standardizing Variables • Given the slightly funky nature of interpreting constituent terms of an interaction, it’s often a good idea to re-center or standardize the variables: • Mean centering: construct a new variable that subtracts off the mean of that variable (observation – mean). • Centers the variable around 0 • Now the interpretation of the partial slope of the constituent term is the effect for the average level of that predictor • Z-score standardizing: mean center and divide by the standard deviation. • Why? Now the partial slope is the effect of increasing the variable by 1 standard deviation rather than by 1 unit. Makes comparisons between predictors easier.