1 / 17

Multiple Linear Regression

Multiple Linear Regression. Linear regression with two or more predictor variables. Introduction.

fineen
Download Presentation

Multiple Linear Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiple Linear Regression Linear regression with two or more predictor variables

  2. Introduction Often in linear regression, you want to investigate the relationship between more than one predictor variable and some outcome. In this case, your model will contain more than one independent variable. It is also often important to investigate a possible interaction between two or more independent variables.

  3. Consider the following situation: The file air.txt contains a subsample of data from a study of the effect of air pollution on lung function. The variables measured were age, gender, height, weight, forced vital capacity (FVC), and forced expiratory volume in 1 second (FEV1). FVC is the total volume of air in liters which an individual can expel regardless of how long it takes. FEV1 is the volume of air expelled during the first second when an individual has been told to breath in deeply and then expel as much air as possible. (Dunn and Clark (1987), Applied Statistics: Analysis of Variance and Regression, p.354.)

  4. Input the file air.txt into SAS with the following code (adjusting the location of the file as necessary): DATAair; INFILE ‘C:\air.txt'dlm = ' ' firstobs = 2; INPUT sex age height weight fvc fev1; height_age = height*age; RUN; “Height_age” creates a new variable which represents the interaction between height and age.

  5. Exploring the Data We are interested in what factors may predict FVC. In order to explore this before analyzing the data, create two plots: one of FVC vs. height; the other of FVC vs. age: PROCGPLOT DATA = air; PLOT fvc * height; PLOT fvc * age; RUN;

  6. Plot of FVC * Height

  7. Plot of FVC * Age

  8. It appears a linear relationship is justified between FVC and height, although it is unclear whether a linear relationship exists between FVC and age. Create a multiple linear regression model using both height and age to predict FVC: PROCREG DATA = air; MODEL fvc = height age; RUN; QUIT;

  9. Multiple Regression Output

  10. Interpreting Output • The multiple regression equation is: Yhat = -6.67 + 0.18(height) – 0.03(age) • The R-Square value is interpreted the same as with simple linear regression:67% of the variance in FVC is explained by height and age in the model. • Because the model includes more than one predictor variable, you may want to consider using the adjusted R2 (Adj R-Sq) value instead of the R-Square for interpreting amount of variance explained by the independent variables.

  11. Overall F-test To test whether all of the variables taken together significantly predict the outcome variable (FVC), use the overall F-test. The test statistic (F* = 36.96) is found under F Value. The associated pvalue (<0.001) is found under Pr > F. Ho: β1= β2= 0 vs. Ha: At least one β ≠ 0. Because the p-value is less than 0.05, we reject the null hypothesis and conclude that taken together, height and age are significantly related to FVC.

  12. Testing Significance of One Variable To test the significance of an individual variable in predicting FVC, use the test statistic (t Value) and associated pvalue for that particular variable (Pr > |t|). For example, the test of whether height is significantly related to FVC [Ho: β1= 0 vs. Ha: β1 ≠ 0], has t* = 8.15, p < 0.0001. Reject the null hypothesis and conclude that height is significantly related to FVC.

  13. Testing for an Interaction Because we have more than one predictor variable, it is important to consider whether they interact in some way. To test whether the interaction between height and age is significant, create another model in SAS that contains both the main effects of height and age as well as the interaction term you created: PROCREGDATA = air; MODEL fvc = height age height_age; RUN; QUIT;

  14. Output with Interaction Term

  15. Is the interaction significant? Notice that the pvalue for the interaction is 0.39, which is greater than 0.05. Therefore, the interaction between age and height is not significant, and we do not need to include it in the model. Additionally, notice that the R-Square is 0.679, indicating that 68% of the variability in FVC is explained by height, age and height_age. This number is not much larger than the R-Square from the model with just height and age. This also is a good indicator that the interaction term is not necessary. The final model only needs to include height and age predicting FVC.

  16. Conclusions Multiple Linear Regression in SAS is very similar to Simple Linear Regression. The major difference is that more variables are added to the model statement, and interaction terms need to be considered. Use the same options (clb, cli, clm) for creating confidence intervals in SAS and determining outliers (r) and influential points (influence).

  17. Linear Regression is used with continuous outcome variables. If the outcome variable of interest is categorical, logistic regression analysis is used. The next tutorial is an introduction to logistic regression.

More Related