1 / 22

Lab 4

Lab 4. Multiple Linear Regression. Meaning. An extension of simple linear regression It models the mean of a response variable as a linear function of several explanatory variables. Ways of analysis. Matrix of scatterplots Matrix of correlations Regression:

rwebster
Download Presentation

Lab 4

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lab 4 Multiple Linear Regression

  2. Meaning • An extension of simple linear regression • It models the mean of a response variable as a linear function of several explanatory variables

  3. Ways of analysis • Matrix of scatterplots • Matrix of correlations • Regression: fit the model (variable selection); interpret the model, t-test & f-test in regression; prediction; diagnostics (linearity, constant var, normality, independence, outliers) .

  4. The independent variable, the response • The response: iq • The independent variables: • MILK: 0=no breast milk, 1=yes • FEM: 0=male kid, 1=female • WEEKS: weeks in ventilation • SOCIAL: mum’s social class • 1,2,3,4 with 1 being the highest • RANK: birth order of the kid • EDUC: mum’s education level • 1,2,3,4,5 with 5 being the highest

  5. Matrix of scatterplots

  6. Correlation among iq, weeks, social, educ, rank

  7. Matrix of correlations

  8. Regression-fit the model • Procedure • Analyze  Regression  Linear • Methods of determining independent variables

  9. Methods (details in instruction 4 P18) • Enter: The model is obtained with all specified variables. This is the default method. • Stepwise • Remove • Backward: The variables are removed from the model one by one if the meet the criterion for removal (a maximum significance level or a minimum F value). • Forward:

  10. Regression-interpret model • Interpretation of the output 1. variables entered/removed 2. model summaries (R, R^2) 3. ANOVA test (f-test)

  11. Note on f-test • To test overall significance of the model • its null distribution: f-distribution • To further construct extra-sum-of-squares f-test

  12. 4. Coefficients (estimation, t-test, CI of coefficients) • t-test in i-th row • CI of coefficients

  13. Note on t-test and CI of coefficients • t-test • to test the significance of a single independent variable • can be one-sided • its null distribution: t-distribution • 95% CI of coefficients • estimation of the range of its coefficient with 95% confidence • i.e. the 95% changing range of Y with 1 unit increase in its corresponding X

  14. Regression-prediction • Point estimation • Confidence interval of the mean (CI) • Prediction interval of one observation (PI) • e.g.

  15. Multiple Regression-Diagnostics Obtain plots to test the validity of the assumptions Linearity: Residuals vs predicted value (Y) / explanatory variable (X) Constant variance: Residuals vs predicted value (Y) / explanatory variable (X) Normality: QQ plot of residuals Independence: residuals versus the time order of the observations Outliers and influential observations:

  16. What is an influential observation? • An observation is influential if removing it markedly changes the estimated coefficients of the regression model. • An outlier may be an influential observation.

  17. To identify outliers and/or influential observations • Studentized Residuals A case may be considered an outlier if the absolute value of its studentized residual exceeds 2. • Leverage Values The leverage for an observation is larger than 2p/n would imply the observation has a high potential for influence. • Cook’s Distances If Cook’s distance is close to or larger than 1, the case may be considered influential.

  18. Miscellanies • Multicollinearity • it exists if the correlation between independent variables is close to or higher than 0.85 • Remember to use Ln(WEEKS) from Question 5

  19. Miscellanies • Understanding meaning of 95% CI of coefficients • Identify “full model” and “reduced model” when doing extra-sum-of-squares f-test

More Related