260 likes | 558 Views
ASSUMPTION CHECKING. In regression analysis with Stata In multi-level analysis with Stata (not much extra) In logistic regression analysis with Stata NOTE: THIS WILL BE EASIER IN STATA THAN IT WAS IN SPSS. Assumption checking in “normal” multiple regression with Stata.
E N D
ASSUMPTION CHECKING • In regression analysis with Stata • In multi-level analysis with Stata (not much extra) • In logistic regression analysis with Stata NOTE: THIS WILL BE EASIER IN STATA THAN IT WAS IN SPSS
Assumption checking in “normal” multiple regression with Stata
Assumptions in regression analysis • No multi-collinearity • All relevant predictor variables • included • Homoscedasticity: all residuals are • from a distribution with the same variance • Linearity: the “true” model should be • linear. • Independent errors: having information • about the value of a residual should not • give you information about the value of • other residuals • Errors are distributed normally
FIRST THE ONE THAT LEADS TO NOTHING NEW IN STATA (NOTE: SLIDE TAKEN LITERALLY FROM MMBR) Independent errors: havinginformationabout the value of a residualshouldnotgiveyouinformationabout the value of otherresiduals Detect: askyourselfwhetherit is likelythatknowledgeaboutoneresidualwouldtellyousomethingabout the value of anotherresidual. Typical cases: -repeatedmeasures -clusteredobservations (peoplewithinfirms / pupilswithin schools) Consequences: as forheteroscedasticity Usually, yourconfidenceintervals are estimatedtoosmall (thinkaboutwhythat is!). Cure: usemulti-level analyses
In Stata: Example: the Stata “auto.dta” data set sysuse auto corr (correlation) vif (variance inflation factors) ovtest (omitted variable test) hettest (heterogeneity test) predict e, resid swilk (test for normality)
Finding the commands • “help regress” • “regress postestimation” and you will find most of them (and more) there
Multi-collinearity A strongcorrelationbetweentwoor more of your predictor variables Youdon’t want it, because: • It is more difficult to gethigher R’s • The importance of predictorscanbedifficult to establish (b-hatstend to go to zero) • The estimatesforb-hats are unstableunderslightly different regressionattempts (“bouncingbeta’s”) Detect: • Look at correlation matrix of predictor variables • calculateVIF-factorswhile running regression Cure: Delete variables sothatmulti-collinearitydisappears, forinstancebycombiningtheminto a single variable
Stata: calculating the correlation matrix (“corr”) and VIF statistics (“vif”)
Misspecificationtests(replaces: all relevant predictor variables included)
Homoscedasticity: all residuals are from a distribution with the samevariance Consequences: Heteroscedasticiy does notnecessarilylead to biases in yourestimatedcoefficients (b-hat), butit does lead to biases in the estimate of the width of the confidence interval, and the estimation procedure itself is notefficient.
Testing for heteroscedasticity in Stata • Your residuals should have the same variance for all values of Y hettest • Your residuals should have the same variance for all values of X hettest, rhs
Errorsdistributednormally Errors are distributednormally (justthe errors, not the variables themselves!) Detect: look at the residual plots, test fornormality Consequences: rule of thumb: ifn>600, noproblem. Otherwiseconfidenceintervals are wrong. Cure: try to fit a better model, oruse more difficultways of modelinginstead (askan expert).
Errorsdistributednormally First calculate the errors: predict e, resid Then test for normality swilke
Assumption checking in multi-level multiple regression with Stata
In multi-level • Test all that you would test for multiple regression – poor man’s test: do this using multiple regression! (e.g. “hettest”) Add: • xttest0 (see last week) Add (extra): Test visually whether the normality assumption holds, but do this for the random
Note: extra material(= not on the exam, bonus points if you know how to use it) tab school, gen(sch_) regy sch2 – sch28 gen coefs = . for num 2/28: replace coefs =_b[schX] if _n==X swilkcoefs
Assumption checking in multi-level multiple regression with Stata
Assumptions • Y is 0/1 • Ratio of cases to variables should be “reasonable” • No cases where you have complete separation (Stata will remove these cases automatically) • Linearity in the logit (comparable to “the true model should be linear” in multiple regression) • Independence of errors (as in multiple regression)
Further things to do: • Check goodness of fit and prediction for different groups (as done in the do-file you have) • Check the correlation matrix for strong correlations between predictors (corr) • Check for outliers using regress and diag(but don’t tell anyone I suggested this)