1 / 19

Sample Selection Example

Sample Selection Example. Bill Evans. Draw 10,000 obs at random educ uniform over [0,16] age uniform over [18,64] wearnl=4.49 + 0.08*educ + 0.012*age + ε Generate missing data for wearnl. drawn from standard normal [0,1] d * =-1.5+0.15*educ+0.01*age+0.15*z+v wearnl missing if d * ≤ 0

Download Presentation

Sample Selection Example

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sample Selection Example Bill Evans

  2. Draw 10,000 obs at random • educ uniform over [0,16] • age uniform over [18,64] • wearnl=4.49 + 0.08*educ + 0.012*age + ε • Generate missing data for wearnl

  3. drawn from standard normal [0,1] • d*=-1.5+0.15*educ+0.01*age+0.15*z+v • wearnl missing if d*≤0 • wearn reported if d*>0 • wearnl_all=wearnl with non-missing obs.

  4. εi and vi are assumed to be bivariate normal • E(εi) = E(vi) =0 • Var(εi) = σ2 • Var(vi) = 1 • Corr(εi,vi) = ρ • Cov(εi,vi) = ρ σ • In this case, ρ=0.25 and σ=0.46

  5. Yi = β0 + β1educi + β2agei + εi • E[Yi | SSR] = β0 + β1educi + β2agei + E[εi | SSR] • E[εi | SSR] = E[εi | vi>-wiγ] = ρ σ φ(wiγ)/Φ(wiγ)

  6. λi = φ(wiγ)/Φ(wiγ) • wiγ = γ0+educ γ1+age γ2+z γ3 • γ2 and γ3 are both constructed to be positive • cov(educ, λi) < 0 and • cov(age, λi) < 0

  7. The omitted variable λi is negatively correlated with what is observed in the model • Therefore, the coefficients on educ and age in the selected sample will be too low

  8. Numbe rof non-missing observations

  9. OLS on all data (no missing obs) Generated by the equation wearnl=4.49 + 0.08*educ + 0.012*age + ε

  10. OLS on reported data Smaller MSE Notice that the estimates for educ and age are now smaller

  11. Probit, why is data non-missing Generated by the equation d*=-1.5+0.15*educ+0.01*age+0.15*z+v

  12. Syntax for Heckman model in STATA . heckman wearnl educ age, select(educ age z); Equation of interest Variables in selection equation

  13. Notice β’s have increased over OLS w/ missing data Cannot reject null Rho=0 Sigma right on Rho is a little off

  14. Comparison of Estimates

  15. Comparison of Estimates [% difference from OLS w/ all data]

  16. * run heckman sample selection correction; • . * but use functional form to identify the model; • . heckman wearnl educ age, select(educ age);

  17. No where close on rho

  18. Comparison of Estimates [% difference from OLS w/ all data]

More Related