520 likes | 837 Views
Stata 3, Regression. Hein Stigum Presentation, data and programs at: http://folk.uio.no/heins/. Agenda. Linear regression GLM Logistic regression Binary regression (Conditional logistic). Linear regression. Birth weight by gestational age. Regression idea. Model and assumptions.
E N D
Stata 3, Regression Hein Stigum Presentation, data and programs at: http://folk.uio.no/heins/ H.S.
Agenda • Linear regression • GLM • Logistic regression • Binary regression • (Conditional logistic) H.S.
Linear regression Birth weight by gestational age H.S.
Regression idea H.S.
Model and assumptions • Model • Assumptions • Independent errors • Linear effects • Constant error variance H.S.
Association measure: RD Model: Start with: Hence: H.S.
Purpose of regression • Estimation • Estimate association between outcome and exposure adjusted for other covariates • Prediction • Use an estimated model to predict the outcome given covariates in a new dataset H.S.
Not adjust Cofactor is a collider Cofactor is in causal path May or may not adjust Cofactor has missing Cofactor has error Adjusting for confounders H.S.
Workflow • Scatterplots • Bivariate analysis • Regression • Model fitting • Cofactors in/out • Interactions • Test of assumptions • Independent errors • Linear effects • Constant error variance • Influence (robustness) H.S.
Scatterplot H.S.
Syntax • Estimation • regress y x1 x2 linear regression • xi: regress y x1 i.c1 categorical c1 • Post estimation • predict yf, xb predict • Manage models • estimates store m1 save model H.S.
Model 2: Add counfounders Estimate association: m1=m2 Prediction: m2 is best H.S.
”Dummies” Assume educ is coded 1, 2, 3 for low, medium and high education Choose low educ as reference Make dummies for the two other categories: generate medium=(educ==2) if educ<. generate high =(educ==3) if educ<. H.S.
Interaction Model: Start with: Hence: H.S.
twoway (scatter res y )(qfitci res y) Test of assumptions • Predict y and residuals • predict y, xb • predict res, resid • Plot resid vs y • independent? • linear? • const. var? H.S.
Violations of assumptions • Dependent residuals Mixed models: xtmixed • Non linear effects gen gest2=gest^2 regress weigth gest gest2 sex • Non-constant variance regress weigth gest sex, robust H.S.
Measures of influence • Measure change in: • Outcome (y) • Deviance • Coefficients (beta) • Delta beta, Cook’s distance Remove obs 1, see change remove obs 2, see change H.S.
Points with high influence lvr2plot, mlabel(id) H.S.
Added variable plot: gestational age avplot gest, mlabel(id) H.S.
Removing outlier H.S.
Influence H.S.
Final model Give meaning to constant term: sum gest /* find smallest value */ generate gest2=gest-204 /* smallest gest=204 */ generate sex2=sex-1 /* boys=0, girls=1 */ regress weight gest2 sex2 /* final model */ estimates store m4 H.S.
Logistic regression Being bullied H.S.
Model and assumptions • Model • Assumptions • Independent residuals • Linear effects H.S.
Association measure, Odds ratio Model: Start with: Hence: H.S.
Syntax • Estimation • logistic y x1 x2 logistic regression • xi: logistic y x1 i.c1 categorical c1 • Post estimation • predict yf, pr predict probability • Manage models • estimates store m1 save model • est table m1, eform show OR H.S.
Workflow • Bivariate analysis • Regression • Model fitting • Cofactors in/out • Interactions • Test of assumptions • Independent errors • Linear effects • Influence (robustness) H.S.
Bivariate Generate dummies gen Island= (country==2) if country<. gen Norway= (country==3) gen Finland= (country==4) gen Denmark= (country==5) H.S.
Model 1: outcome and exposure Alternative commands: xi:logistic bullied i.country use xi: i.var for categorical variables xi:logistic bullied i.country , coef coefs instead of OR's xi:logistic bullied i.country if sex!=. & age!=. do if sex and age not missing H.S.
Estimate associations: m1=m2 Predict: m2 best Model 2: Add confounders H.S.
Interaction Model: Start with: Hence: H.S.
Model 3: interaction H.S.
Test of assumptions • Linear effects (of age) • findit lincheck search and install • lincheck xi:logistic bullied age I.country sex H.S.
Points with high influence estimates restore m2 restore best model predict p, p probability (mu in our notation) predict db, db delta-beta (one value, not one per estimate) scatter db p delta-beta plot H.S.
Removing 2 observations Conclusion: Robust results H.S.
Generalized Linear Models Being bullied H.S.
Designs and measures H.S.
Generalized Linear Models, GLM Linear regression Logistic regression Poisson regression H.S.
GLM: Distribution and link • Distribution family • Given by data • Influence p-value, CI • Link function • May chose • Shape (=link-1) • Scale • Association measure H.S.
Distribution and link examples OBS: not for traditional case control data Link: Identity linear model additive scale H.S.
Being bullied, 3 models glm bullied Island Norway Finland Denmark sex age, family(binomial) link(logit) glm bullied Island Norway Finland Denmark sex age, family(binomial) link(log) glm bullied Island Norway Finland Denmark sex age, family(binomial) link(identity) H.S.
Convergence problems • If glm does not converge, use: • poisson y x1 x2, irr robust RR • regress y x1 x2, robust RD Stop H.S.
Association measure, RR Model: Start with: Hence: H.S.
Association measure: RD Model: Start with: Hence: H.S.
The importance of scale Additive scale Absolute increase Females: 30-20=10 Males: 20-10=10 Conclusion: Same increase for males and females RD Multiplicative scale Relative increase Females: 30/20=1.5 Males: 20/10=2.0 Conclusion: More increase for males RR H.S.
Regression with simple error structure • regress linear regression (also heteroschedastic errors) • nl non linear least squares • GLM • logistic logistic regression • poisson Poisson regression • binreg binary outcome, OR, RR, or RD effect measures • Conditional logistc • clogit for matched case-control data • Multiple outcome • mlogit multinomial logit (not ordered) • ologit ordered logit • Regression with complex error structure • xtmixed linear mixed models • xtlogit random effect logistic H.S.
Syntax • Estimation • regress y x1 x2 linear regression • logistic y x1 x2 logistic regression • xi:regress y x1 i.x2 categorical x2 • Manage results • estimates store m1 store results • estimates table m1 m2 table of results • estimates stats m1 m2 statistics of results • Post estimation • predict y, xb linear prediction • predict res, resid residuals • lincom b0+2*b3 linear combination • Help • help logistic postestimation H.S.