120 likes | 306 Views
Assumption checking in “normal” multiple regression with Stata. Assumptions in regression analysis. No multi-collinearity All relevant predictor variables included Homoscedasticity : all residuals are from a distribution with the same variance
E N D
Assumption checking in “normal” multiple regression with Stata
Assumptions in regression analysis • No multi-collinearity • All relevant predictor variables • included • Homoscedasticity: all residuals are • from a distribution with the same variance • Linearity: the “true” model should be • linear. • Independent errors: having information • about the value of a residual should not • give you information about the value of • other residuals • Errors are distributed normally
FIRST THE ONE THAT LEADS TO NOTHING NEW IN STATA (NOTE: SLIDE TAKEN LITERALLY FROM MMBR) Independent errors: havinginformationabout the value of a residualshouldnotgiveyouinformationabout the value of otherresiduals Detect: askyourselfwhetherit is likelythatknowledgeaboutoneresidualwouldtellyousomethingabout the value of anotherresidual. Typical cases: -repeatedmeasures -clusteredobservations (peoplewithinfirms / pupilswithin schools) Consequences: as forheteroscedasticity Usually, yourconfidenceintervals are estimatedtoosmall (thinkaboutwhythat is!). Cure: usemulti-level analyses
In Stata: Example: the Stata “auto.dta” data set sysuse auto corr (correlation) vif (variance inflation factors) ovtest (omitted variable test) hettest (heterogeneity test) predict e, resid swilk (test for normality)
Finding the commands • “help regress” • “regress postestimation” and you will find most of them (and more) there
Multi-collinearity A strongcorrelationbetweentwoor more of your predictor variables Youdon’t want it, because: • It is more difficult to gethigher R’s • The importance of predictorscanbedifficult to establish (b-hatstend to go to zero) • The estimatesforb-hats are unstableunderslightly different regressionattempts (“bouncingbeta’s”) Detect: • Look at correlation matrix of predictor variables • calculateVIF-factorswhile running regression Cure: Delete variables sothatmulti-collinearitydisappears, forinstancebycombiningtheminto a single variable
Stata: calculating the correlation matrix (“corr”) and VIF statistics (“vif”)
Misspecification tests(replaces: all relevant predictor variables included)
Homoscedasticity: all residuals are from a distribution with the samevariance Consequences: Heteroscedasticiy does notnecessarilylead to biases in yourestimatedcoefficients (b-hat), butit does lead to biases in the estimate of the width of the confidence interval, and the estimation procedure itself is notefficient.
Testing for heteroscedasticity in Stata • Your residuals should have the same variance for all values of Y hettest • Your residuals should have the same variance for all values of X hettest, rhs
Errorsdistributednormally Errors are distributednormally (justthe errors, not the variables themselves!) Detect: look at the residual plots, test fornormality Consequences: rule of thumb: ifn>600, noproblem. Otherwiseconfidenceintervals are wrong. Cure: try to fit a better model, oruse more difficultways of modelinginstead (askan expert).
Errorsdistributednormally First calculate the errors: predict e, resid Then test for normality swilke