1 / 18

Stat 112: Lecture 20 Notes

Stat 112: Lecture 20 Notes. Chapter 7.2: Interaction Variables. Chapter 8: Model Building. I will e-mail Homework 6 by Friday. It will be due on Friday, Dec. 1 st (the Friday after Thanksgiving). Interaction.

reed
Download Presentation

Stat 112: Lecture 20 Notes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stat 112: Lecture 20 Notes • Chapter 7.2: Interaction Variables. • Chapter 8: Model Building. • I will e-mail Homework 6 by Friday. It will be due on Friday, Dec. 1st (the Friday after Thanksgiving)

  2. Interaction • Interaction is a three-variable concept. One of these is the response variable (Y) and the other two are explanatory variables (X1 and X2). • There is an interaction between X1 and X2 if the impact of an increase in X2 on Y depends on the level of X1. • To incorporate interaction in multiple regression model with two continuous variables, we add the explanatory variable . There is evidence of an interaction if the coefficient on is significant (t-test has p-value < .05).

  3. Accidents Example • The number of car accidents on a stretch of highway seems to be related to the number of vehicles that travel over it and the speed at which they are traveling. • A city alderman has decided to ask the county sheriff to provide him with statistics covering the last few years with the intention of examining these data statistically so that she can introduce new speed laws that will reduce traffic accidents. • accidents.JMP contains data for different time periods on the number of cars passing along the stretch of road, the average speed of the cars and the number of accidents during the time period.

  4. Toy Factory Manager Data

  5. Model without Interaction The lines in the plot show the mean time for run given run size for each of the three managers. The lines are parallel because the model assumes no interaction.

  6. Interaction Model involving Categorical Variables in JMP • To add interactions involving categorical variables in JMP, follow the same procedure as with two continuous variables. Run Fit Model in JMP, add the usual explanatory variables first, then highlight one of the variables in the interaction in the Construct Model Effects box and highlight the other variable in the interaction in the Columns box and then click Cross in the Construct Model Effects box.

  7. Interaction Model

  8. Interaction Model • Interaction between run size and Manager: The effect on mean run time of increasing run size by one is different for different managers. • Effect Test for Interaction: • Manager*Run Size Effect test tests null hypothesis that there is no interaction (effect on mean run time of increasing run size is same for all managers) vs. alternative hypothesis that there is an interaction between run size and managers. p-value =0.0333. Evidence that there is an interaction.

  9. The runs supervised by Manager A appear abnormally time consuming. Manager b has higher initial fixed setup costs than Manager c (186.565>149.706) but has lower per unit production time (0.136<0.259).

  10. Interaction Profile Plot Lower left hand plot shows mean time for run vs. run size for the three managers a, b and c.

  11. Interactions Involving Categorical Variables: General Approach • First fit model with an interaction between categorical explanatory variable and continuous explanatory variable. Use effect test on interaction to see if there is evidence of an interaction. • If there is evidence of an interaction (p-value <0.05 for effect test), use interaction model. • If there is not strong evidence of an interaction (p-value >0.05 for effect test), use model without interactions. The model without interactions is easier to interpret but should only be used if there is not strong evidence for interactions.

  12. Example: A Sex Discrimination Lawsuit • Did a bank discriminatorily pay higher starting salaries to men than to women. Harris Trust and Savings Bank was sued by a group of female employees who accused the bank of paying lower starting salries to women. The data in harrisbank.JMP are the starting salaries for all 32 male and all 61 female skilled, entry-level clerical employees hired by the bank between 1969 and 1977, as well as the education levels and sex of the employees.

  13. No evidence of an interaction between Sex and Education. Fit model without interactions.

  14. Discrimination Case Regression Results • Strong evidence that there is a difference in the mean starting salaries of women and men of the same education level. • Estimated difference: Men have 345.904+345.904=$691.81 higher mean starting salaries than women of the same education level. • 95% confidence interval for mean difference = (2*$214.55,2*$477.25)=($429.10,$854.50). • Bank’s defense: Omitted variable bias. Variables such as Seniority, Age, Experience also need to be controlled for.

  15. Model Building • When we have many potential explanatory variables, how should we decide which to use? • Suppose our goal is to estimate the causal effect of a variable, controlling for all lurking variables (e.g., effect of pollution on mortality). Then it is best to include all possible lurking variables. Omitting variables, even if they do not appear significant leads to potential bias. • Suppose our goal is to understand the association of certain variable(s) with a response, holding fixed for certain other variables. We should think carefully about what variables we want to hold fixed. • Suppose our goal is to predict the response based on explanatory variables and we are not particularly interested in interpreting the coefficients on individual variables. Then it is a good idea to use only those explanatory variables which are of value for predicting the response. Using too many variable costs too many degrees of freedom and will hurt our out of sample predictions.

  16. Model Building for Prediction • The handout “Model Building for Prediction” discusses how to select a subset of the explanatory variables to use in the model when our goal is to make the best predictions, and we are not concerned with interpreting the coefficients of variables.

More Related