320 likes | 457 Views
Estimation of constant-CV regression models. Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL. y i = b 0 + b 1 x i + e i. Variance Models with Simple Linear Regression. y i = b 0 + b 1 x i + e i V ( e i ) = s 2.
E N D
Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL
yi = b0 + b1xi + ei Variance Models with Simple Linear Regression yi = b0 + b1xi + ei V( ei ) = s2 yi = b0 + b1xi + ei V( ei ) = s2(b0 + b1xi)2 y = Xb + Zu
yi = b0 + b1xi + ei V(ei) = s2(b0 + b1xi)2 Example: b0 = 1.0, b1= 0.5, s = 0.10 .clear . set obs 100 . gen x=10*uniform() . gen mu = 1+.5*x . replace y=mu+.10*mu*invnorm(uniform()) Problem: Estimate b0, b1,and s.
Variance Stabilization yi = b0 + b1xi + ei V(ei) = s2(b0 + b1xi)2 But E(log yi) = g(b0 ,b1, xi)
Approximate g(b0 ,b1, xi) by polynomial in x, then do OLS regression of log y on x: . gen z = log(y) . gen x2 = x*x . reg z x x2 . predict z_hat b0 = 1.0, b1= 0.5, s = 0.10 Source | SS df MS Number of obs = 100 -------------+------------------------------ F( 2, 97) = 630.65 Model | 15.9635043 2 7.98175216 Prob > F = 0.0000 Residual | 1.22767564 97 .01265645 R-squared = 0.9286 -------------+------------------------------ Adj R-squared = 0.9271 Total | 17.19118 99 .173648282 Root MSE = .1125 ------------------------------------------------------------------------------ z | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | .2707448 .018873 14.35 0.000 .2332872 .3082025 x2 | -.0110946 .0017622 -6.30 0.000 -.0145921 -.0075972 _cons | .1783796 .0441957 4.04 0.000 .0906634 .2660958 ------------------------------------------------------------------------------
Alternative: Iteratively re-weighted regression reg y x predict muh reg y x [w=1/muh^2] .local rmse=e(rmse) .gen wt = 1/muh^2 .summ wt .local wbar=r(mean) .local sigh = sqrt(`wbar’)*`rmse’
ITERATION 0 . reg y x Source | SS df MS Number of obs = 100 -------------+------------------------------ F( 1, 98) = 832.95 Model | 172.307709 1 172.307709 Prob > F = 0.0000 Residual | 20.272763 98 .206864928 R-squared = 0.8947 -------------+------------------------------ Adj R-squared = 0.8937 Total | 192.580472 99 1.94525729 Root MSE = .45482 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | .518296 .0179585 28.86 0.000 .482658 .5539339 _cons | .9530661 .1014154 9.40 0.000 .7518106 1.154322 ------------------------------------------------------------------------------ . gen wt = 1/(_b[_cons] + _b[x]*x)^2
ITERATION 1 . reg y x [w=wt] (analytic weights assumed) (sum of wgt is 1.3046e+01) Source | SS df MS Number of obs = 100 -------------+------------------------------ F( 1, 98) = 1278.10 Model | 123.561274 1 123.561274 Prob > F = 0.0000 Residual | 9.47421712 98 .096675685 R-squared = 0.9288 -------------+------------------------------ Adj R-squared = 0.9281 Total | 133.035492 99 1.34379284 Root MSE = .31093 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | .510555 .014281 35.75 0.000 .4822147 .5388952 _cons | .9877915 .0533903 18.50 0.000 .8818402 1.093743 ------------------------------------------------------------------------------ . replace wt = 1/(_b[_cons] + _b[x]*x)^2 (100 real changes made)
ITERATION 2 . reg y x [w=wt] (analytic weights assumed) (sum of wgt is 1.2842e+01) Source | SS df MS Number of obs = 100 -------------+------------------------------ F( 1, 98) = 1271.77 Model | 124.885941 1 124.885941 Prob > F = 0.0000 Residual | 9.62343467 98 .098198313 R-squared = 0.9285 -------------+------------------------------ Adj R-squared = 0.9277 Total | 134.509376 99 1.35868057 Root MSE = .31337 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | .510746 .0143219 35.66 0.000 .4823247 .5391674 _cons | .9870233 .0540361 18.27 0.000 .8797904 1.094256 ------------------------------------------------------------------------------ . replace wt = 1/(_b[_cons] + _b[x]*x)^2 (100 real changes made)
ITERATION 3 . noi reg y x [w=wt] (analytic weights assumed) (sum of wgt is 1.2846e+01) Source | SS df MS Number of obs = 100 -------------+------------------------------ F( 1, 98) = 1271.92 Model | 124.855967 1 124.855967 Prob > F = 0.0000 Residual | 9.6200215 98 .098163485 R-squared = 0.9285 -------------+------------------------------ Adj R-squared = 0.9277 Total | 134.475988 99 1.35834331 Root MSE = .31331 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | .5107417 .0143209 35.66 0.000 .4823223 .5391612 _cons | .9870406 .0540213 18.27 0.000 .8798371 1.094244 ------------------------------------------------------------------------------ . summ wt Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- wt | 100 .1284623 .1209472 .0269917 .5841107 . local wbar=r(mean) . noi di e(rmse)*sqrt(`wbar') .11229563 b0 = 1.0, b1= 0.5, s = 0.10
Can we do this using -xtmixed- ? yi =b0+b1xi+s(b0+b1xi)ui = b0+b1xi+c0u0i+c1xiu1i where u0i = u1i and c1/c0 = b1/b0 . xtmixed y x ||???: x • How do we get –xtmixed- to estimate a non-constant residual variance? • Degenerate dependency of random effects (u0i = u1i). • Coefficients of random intercept and slope (c0 and c1) need to be constrained.
Can we do this using -xtmixed- ? How do we get –xtmixed- to estimate a non-constant residual variance? Ex. 1: Ignore dependency of u’s and constraints on c’s: yi = b0 + b1xi + c0u0i + c1xiu1i set obs 1000 gen x = 5*uniform() gen mu = 3+1.4*x gen u0=invnorm(uniform()) gen u1=invnorm(uniform()) gen y = mu + 0.05*u0 + 0.50*x*u1 gen ord=_n xtmixed y x ||ord: x,noc
. xtmixed y x ||ord: x,noc nolog Mixed-effects REML regression Number of obs = 1000 Group variable: ord Number of groups = 1000 Obs per group: min = 1 avg = 1.0 max = 1 Wald chi2(1) = 5745.72 Log restricted-likelihood = -1444.8061 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | 1.404595 .0185301 75.80 0.000 1.368276 1.440913 _cons | 3.0182 .0116741 258.54 0.000 2.99532 3.041081 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ ord: Identity | sd(x) | .5058616 .0118386 .4831824 .5296053 -----------------------------+------------------------------------------------ sd(Residual) | .0495162 .0143015 .0281125 .0872159 ------------------------------------------------------------------------------ LR test vs. linear regression: chibar2(01) = 718.85 Prob >= chibar2 = 0.0000 b0 = 3.0, b1= 1.4, c0 = 0.05, c1 = 0.50
Can we do this using -xtmixed- ? How do we get –xtmixed- to estimate a non-constant residual variance? Ex. 2: No random intercept, covariate known: yi = b0 + b1xi + c1ziu1i set obs 1000 gen x = 5*uniform() gen z = 3 + 1.4*x gen u1=invnorm(uniform()) gen y = 3 + 1.4*x + 0.50*z*u1 gen ord=_n xtmixed y x ||ord: z,noc
Can we do this using -xtmixed- ? How do we get –xtmixed- to estimate a non-constant residual variance? Ex. 2: No random intercept, covariate known: yi = b0 + b1xi + c1ziu1i xtmixed y x ||ord: z,noc Garbage! Performing EM optimization: Performing gradient-based optimization: Iteration 0: log restricted-likelihood = -2506.6255 numerical derivatives are approximate flat or discontinuous region encountered Iteration 1: log restricted-likelihood = -2503.1832 numerical derivatives are approximate
Can we do this using -xtmixed- ? How do we get –xtmixed- to estimate a non-constant residual variance? Ex. 2: No random intercept, covariate known: yi = b0 + b1xi + c1ziu1i expand 3 sort ord gen yf=y + .001*invnorm(uniform()) xtmixed yf x ||ord: z,noc nolog
. xtmixed yf x ||ord: z,noc nolog Mixed-effects REML regression Number of obs = 3000 Group variable: ord Number of groups = 1000 Obs per group: min = 3 avg = 3.0 max = 3 Wald chi2(1) = 484.13 Log restricted-likelihood = 7952.0717 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ yf | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | 1.393598 .063337 22.00 0.000 1.26946 1.517736 _cons | 3.071291 .1252851 24.51 0.000 2.825737 3.316846 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ ord: Identity | sd(z) | .4814862 .0107771 .46082 .5030792 -----------------------------+------------------------------------------------ sd(Residual) | .0009896 .0000156 .0009594 .0010207 ------------------------------------------------------------------------------ LR test vs. linear regression: chibar2(01) = 31334.04 Prob >= chibar2 = 0.0000 b0 = 3.0, b1= 1.4, c1 = 0.50
Can we do this using -xtmixed- ? Ex 3: No random intercept (unknown covariate) yi =b0+b1xi+s(b0+b1xi)ui = b0+b1xi+c0u0i+c1xiu1i where u0i = u1i and c1/c0 = b1/b0 • Degenerate dependency of random effects (u0i = u1i). • Coefficients of random intercept and slope (c0 and c1) need to be constrained.
Can we do this using -xtmixed- ? Ex 3: No random intercept (unknown covariate) yi =b0+b1xi+s(b0+b1xi)ui = b0+b1xi+c0u0i+c1xiu1i where u0i = u1i and c1/c0 = b1/b0 • Degenerate dependency of random effects (u0i = u1i). • Coefficients of random intercept and slope (c0 and c1) need to be constrained. Recast model with one error term and pretend zi = b0+b1xiis known. Then iterate.
Can we do this using -xtmixed- ? yi = b0 + b1xi + s(b0 + b1xi)ui = b0 + b1xi+ c1ziu1i • Expand and introduce artificial “residual” error term .expand 3 .gen yf=y + .001*invnorm(uniform())
Can we do this using -xtmixed- ? yi = b0 + b1xi + s(b0 + b1xi)ui = b0 + b1xi+ c1ziu1i • Expand and introduce artificial “residual” error term. • 2. Estimate zi by OLS or other “easy” method. .expand 3 .gen yf=y + .001*invnorm(uniform()) .reg y x .predict zh
Can we do this using -xtmixed- ? yi = b0 + b1xi + c1ziu1i [zi = b0 + b1xiis unknown] • Expand and introduce artificial “residual” error term. • 2. Estimate zi by OLS or other “easy” method. • 3. Fit model pretending prediction zhi is actual zi. .expand 3 .gen yf=y + .001*invnorm(uniform()) .reg y x .predict zh .xtmixed yf x ||ord: zh,noc nolog
Can we do this using -xtmixed- ? yi = b0 + b1xi + c1ziu1i [zi = b0 + b1xiis unknown] • Expand and introduce artificial “residual” error term. • 2. Estimate zi by OLS or other “easy” method. • 3. Fit model pretending prediction zhi is actual zi. • 4. Iterate. .expand 3 .gen yf=y + .001*invnorm(uniform()) .reg y x .predict zh .xtmixed yf x ||ord: zh,noc nolog .drop zh .predict zh
. xtmixed yf x ||ord: zh,noc Mixed-effects REML regression Number of obs = 3000 Group variable: ord Number of groups = 1000 Obs per group: min = 3 avg = 3.0 max = 3 Wald chi2(1) = 484.01 Log restricted-likelihood = 7952.4049 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ yf | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | 1.39339 .0633354 22.00 0.000 1.269255 1.517525 _cons | 3.071699 .1261407 24.35 0.000 2.824467 3.31893 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ ord: Identity | sd(zh) | .4764546 .0106645 .4560044 .4978219 -----------------------------+------------------------------------------------ sd(Residual) | .0009896 .0000156 .0009594 .0010207 ------------------------------------------------------------------------------ LR test vs. linear regression: chibar2(01) = 31334.71 Prob >= chibar2 = 0.0000 b0 = 3.0, b1= 1.4, c1 = 0.50
2-level model E(yij| xij , u1i ) E(yij| xij ) [“z”] yij = b0 + b1xij+ c1(b0 + b1xij)u1i + c2[b0 + b1xij+ c1(b0 + b1xij)u1i]u2i [“zi”] args NS nr be0 be1 c1 c2 drop _all set obs `NS' gen id=_n gen u1 = invnorm(uniform()) expand `nr' sort id gen u2=invnorm(uniform()) gen x = 10*uniform() gen z = `be0' + `be1'*x gen zi = z + `c1'*z*u1 gen y = zi + `c2'*zi*u2
//[gen y = zi + `c2'*zi*e] gen obs=_n expand 3 sort obs gen yf = y + .001*invnorm(uniform()) xtmixed y x ||id: x,noc nolog predict zh0 predict uh1i_0,reffects level(id) gen zhi_0 = zh0 + uh1i_0 xtmixed yf x ||id: zh0,noc ||obs: zhi_0,noc nolog predict zh1 predict uh1i_1,reffects level(id) gen zhi_1 = zh1 + uh1i_1 xtmixed yf x ||id: zh1,noc ||obs: zhi_1,noc nolog predict zh2 predict zhi_2,reffects level(id) gen zhi_2 = zh2 + uh1i_2 noi xtmixed yf x ||id: zh2,noc ||obs: zhi_2,noc nolog 0 1 2 3
. run nasug_2008_sim1 20 5 1.0 1.0 .2 .05 6 1 .21211063 .05062864 .21216237 .05076224 .21213417 .05075685 .21213363 .05075686 .21213354 .05075685 .21213353 .05075685 Mixed-effects REML regression Number of obs = 300 ----------------------------------------------------------- | No. of Observations per Group Group Variable | Groups Minimum Average Maximum ----------------+------------------------------------------ id | 20 15 15.0 15 obs | 100 3 3.0 3 -----------------------------------------------------------
. run nasug_2008_sim1 20 5 1.0 1.0 .2 .05 6 1 Wald chi2(1) = 421.17 Log restricted-likelihood = 1038.0836 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ yf | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | 1.087757 .0530034 20.52 0.000 .9838727 1.191642 _cons | 1.039358 .0535761 19.40 0.000 .9343512 1.144366 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ id: Identity | sd(zh) | .2121335 .0348399 .1537487 .2926895 -----------------------------+------------------------------------------------ obs: Identity | sd(zhi) | .0507568 .0040389 .0434271 .0593237 -----------------------------+------------------------------------------------ sd(Residual) | .0009429 .0000471 .0008549 .00104 ------------------------------------------------------------------------------ LR test vs. linear regression: chi2(2) = 2866.48 Prob > chi2 = 0.0000
b1 b0 c2 c1 Bayesian Estimation (WINBUGS)
WINBUGS node mean sd 2.5% median 97.5% start sample be1 1.077 0.04988 0.9847 1.076 1.169 10001 10000 be0 1.03 0 .05257 0.9317 1.03 1.131 10001 10000 c1 0.217 0.03455 0.1618 0.2127 0.2958 10001 10000 c2 0.064 0.00465 0.0556 0.0637 0.07353 10001 10000 STATA (xtmixed) yf | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | 1.087757 .0530034 20.52 0.000 .9838727 1.191642 _cons| 1.039358 .0535761 19.40 0.000 .9343512 1.144366 ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ id: Identity | sd(xb) | .2121335 .0348399 .1537487 .2926895 -----------------------------+------------------------------------------------ obs: Identity | sd(muhi) | .0507568 .0040389 .0434271 .0593237 -----------------------------+------------------------------------------------