220 likes | 242 Views
Understand the matrix correlation, model fitting, t-tests, f-tests, prediction, and diagnostics in multiple linear regression. Learn to interpret results for varied independent variables like IQ, breast milk, gender, ventilation duration, social class, birth order, and education level. Explore methods for selecting key variables and evaluating model significance and coefficients. Diagnose linearity, variance, normality, independence, and outliers. Master outlier and influential point identification using residuals, leverage, Cook's distance, and studentized residuals. Identify multicollinearity and leverage Ln(WEEKS) for optimal analysis. Follow detailed steps for analysis and interpretation based on a complete understanding of 95% confidence intervals and coefficient significance.
E N D
Lab 4 Multiple Linear Regression
Meaning • An extension of simple linear regression • It models the mean of a response variable as a linear function of several explanatory variables
Ways of analysis • Matrix of scatterplots • Matrix of correlations • Regression: fit the model (variable selection); interpret the model, t-test & f-test in regression; prediction; diagnostics (linearity, constant var, normality, independence, outliers) .
The independent variable, the response • The response: iq • The independent variables: • MILK: 0=no breast milk, 1=yes • FEM: 0=male kid, 1=female • WEEKS: weeks in ventilation • SOCIAL: mum’s social class • 1,2,3,4 with 1 being the highest • RANK: birth order of the kid • EDUC: mum’s education level • 1,2,3,4,5 with 5 being the highest
Regression-fit the model • Procedure • Analyze Regression Linear • Methods of determining independent variables
Methods (details in instruction 4 P18) • Enter: The model is obtained with all specified variables. This is the default method. • Stepwise • Remove • Backward: The variables are removed from the model one by one if the meet the criterion for removal (a maximum significance level or a minimum F value). • Forward:
Regression-interpret model • Interpretation of the output 1. variables entered/removed 2. model summaries (R, R^2) 3. ANOVA test (f-test)
Note on f-test • To test overall significance of the model • its null distribution: f-distribution • To further construct extra-sum-of-squares f-test
4. Coefficients (estimation, t-test, CI of coefficients) • t-test in i-th row • CI of coefficients
Note on t-test and CI of coefficients • t-test • to test the significance of a single independent variable • can be one-sided • its null distribution: t-distribution • 95% CI of coefficients • estimation of the range of its coefficient with 95% confidence • i.e. the 95% changing range of Y with 1 unit increase in its corresponding X
Regression-prediction • Point estimation • Confidence interval of the mean (CI) • Prediction interval of one observation (PI) • e.g.
Multiple Regression-Diagnostics Obtain plots to test the validity of the assumptions Linearity: Residuals vs predicted value (Y) / explanatory variable (X) Constant variance: Residuals vs predicted value (Y) / explanatory variable (X) Normality: QQ plot of residuals Independence: residuals versus the time order of the observations Outliers and influential observations:
What is an influential observation? • An observation is influential if removing it markedly changes the estimated coefficients of the regression model. • An outlier may be an influential observation.
To identify outliers and/or influential observations • Studentized Residuals A case may be considered an outlier if the absolute value of its studentized residual exceeds 2. • Leverage Values The leverage for an observation is larger than 2p/n would imply the observation has a high potential for influence. • Cook’s Distances If Cook’s distance is close to or larger than 1, the case may be considered influential.
Miscellanies • Multicollinearity • it exists if the correlation between independent variables is close to or higher than 0.85 • Remember to use Ln(WEEKS) from Question 5
Miscellanies • Understanding meaning of 95% CI of coefficients • Identify “full model” and “reduced model” when doing extra-sum-of-squares f-test