1 / 14

StataWorkshop #2 Linear Regression

StataWorkshop #2 Linear Regression. Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health pchhsu@email.arizona.edu. Outline. Review of linear regression Model fitting Variable selection Model selection Model interpretation. Linear Regression. Expression

cleor
Download Presentation

StataWorkshop #2 Linear Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. StataWorkshop #2Linear Regression Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health pchhsu@email.arizona.edu

  2. Outline • Review of linear regression • Model fitting • Variable selection • Model selection • Model interpretation

  3. Linear Regression • Expression • Y=β0 + β1x1 + β2x2+ ε • Linear relationship between y and x • Given certain x2, as x1 increases one unit, y changes β1 units. • Assumptions • ε(residual)~N(0,σ2) (independent and identical) • Need to evaluate the assumptions • R square (coefficient of determination) presents the percentage of variation of Y explained by all Xs.

  4. Data Set • Lead exposure data • Effects of lead exposure on neurological and psychological function in children • Neurological endpoint • Maxfwt: maximum finger wrist tapping • Independent variables: Group (exposed to lead or not), age, sex, area

  5. Data Management • Drop missing data, i.e. maxfwt=99 • Stata command: drop if maxfwt==99 • Generate dummy variables for area • Stata command: xi i.area • Two dummy variables: _Iarea_2 and _Iarea_3, i.e. Area 1 as the reference group

  6. Data Description • Group • Stata command: tab Group • Age by Group • Stata command: by Group, sort: sum ageyrs • Stata command: ttest ageyrs, by(Group) • Sex by Group • Stata command: tab sex Group,exact • Area by Group • Stata command: tab area Group,exact

  7. Estimation of the regression line • Stata command • reg maxfwt Group sex ageyrs _Iarea_2 _Iarea_3

  8. Variable Selection • Stepwise • Can add and remove variables • Need to specify both entry p-value (pe) and removal p-value (pr) • Forward • Begin from the simplest model and only add “important” variables • Only need to specify pe • Backward • Begin with full model and only remove “not important” variables • Only need to specify pr

  9. Variable Selection (cont’d) • Keep the main interest variable, Group • Stepwise command • sw, pe(0.1) pr(0.2) lock: reg maxfwt Group sex ageyrs (_Iarea_2 _Iarea_3) • Forward command • sw, pe(0.1) lock: reg maxfwt Group sex ageyrs (_Iarea_2 _Iarea_3) • Backward command • sw, pr(0.2) lock: reg maxfwt Group sex ageyrs (_Iarea_2 _Iarea_3)

  10. Model Selection • R^2 vs. adj. R^2 • R^2 increases with # of the covariates in the model. So not a good idea to use it to select a model. • Adj. R^2 penalizes including not so useful covariates in the model. So usually people use it to select a model.

  11. Model 1 vs. Model 2 Model 1 Model 2

  12. Prediction • Stata command • predict yhat, xb • predict ŷ using xb from the regression model • predict seyhat, stdp • predict standard error for the average value • predict sey, stdf • Predict standard error for the individual value

  13. Residual Plots • Stata command • predict studentresid, rstudent • Generate studentized residuals • scatter studentresid yhat,yline(0) • Generate the residual plot Can use rvfplot command too but it uses the original residuals to make the plot!

  14. Stata command qnorm studentresid Generate normal QQ plot for studentized residuals swilk studentresid Perform Shapiro Wilk test http://www.ats.ucla.edu/stat/stata/webbooks/reg/chapter1/statareg1.htm Normality Assumption

More Related