330 likes | 476 Views
Longitudinal Data Fall 2006. Chapter 6 More on Marginal (GEE) Models. Instructors Alan Hubbard Nick Jewell. CD4 versus Time. HIV+ (CD4 Count) Data – some simple analyses using only 2 observations per person.
E N D
Longitudinal DataFall 2006 Chapter 6 More on Marginal (GEE) Models Instructors Alan Hubbard Nick Jewell
HIV+ (CD4 Count) Data – some simple analyses using only 2 observations per person • Purpose is to illustrate the effects on estimates and inference of both different working correlation matrices and robust vs. naive inference: • Consider two scenarios: • baseline (time-independent) covariate, • time-dependent covariate.
Association of Baseline Covariate (Age) on CD4 count. • Binary age (Xij) = 0 (<40) or 1 (>40) • Fit simple linear model: • Compare results of Models A-D
Association of Baseline Covariate (Age) on CD4 count • Model A . xtgee cd4 binage, i(id) cor(ind) ------------------------------------------------------------------------------ cd4 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- binage | 24.2404 14.17075 1.71 0.087 -3.533768 52.01457 _cons | 225.902 9.867247 22.89 0.000 206.5625 245.2414 ------------------------------------------------------------------------------ • Model B . xtgee cd4 binage, i(id) cor(ind) robust (standard errors adjusted for clustering on id) ------------------------------------------------------------------------------ | Semi-robust cd4 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- binage | 24.2404 19.26181 1.26 0.208 -13.51206 61.99286 _cons | 225.902 12.62139 17.90 0.000 201.1645 250.6394 ------------------------------------------------------------------------------
Association of Baseline Covariate (Age) on CD4 count • Model C . xtgee cd4 binage, i(id) cor(exc) ------------------------------------------------------------------------------ cd4 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- binage | 24.2404 19.16452 1.26 0.206 -13.32137 61.80217 _cons | 225.902 13.34446 16.93 0.000 199.7473 252.0566 ------------------------------------------------------------------------------ • Model D . xtgee cd4 binage, i(id) cor(exc) robust (standard errors adjusted for clustering on id) ------------------------------------------------------------------------------ | Semi-robust cd4 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- binage | 24.2404 19.26181 1.26 0.208 -13.51206 61.99286 _cons | 225.902 12.62139 17.90 0.000 201.1645 250.6394 ------------------------------------------------------------------------------
Summary of Results of Association of Baseline Covariate (Age) on CD4 count
Association of Time-Varying Covariate (Viral Load) on CD4 count. • Binary VL: Xij = 0 (<2000) or 1 (>2000) – all subjects included have one low and one high VL. • Fit simple linear model: • Compare results of Models A-D
Association of Time-Varying Covariate (VL) on CD4 count • Model A . xtgee cd4 medvl, i(id) cor(ind) ------------------------------------------------------------------------------ cd4 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- medvl | -98.3494 30.29324 -3.25 0.001 -157.7231 -38.97574 _cons | 377.3735 21.42055 17.62 0.000 335.39 419.357 ------------------------------------------------------------------------------ • Model B . xtgee cd4 medvl, i(id) cor(ind) robust (standard errors adjusted for clustering on id) ------------------------------------------------------------------------------ | Semi-robust cd4 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- medvl | -98.3494 16.51035 -5.96 0.000 -130.7091 -65.98971 _cons | 377.3735 22.92943 16.46 0.000 332.4326 422.3143 ------------------------------------------------------------------------------
Association of Time-Varying Covariate (VL) on CD4 count • Model C . xtgee cd4 medvl, i(id) cor(exc) ------------------------------------------------------------------------------ cd4 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- medvl | -98.3494 16.41059 -5.99 0.000 -130.5136 -66.18523 _cons | 377.3735 21.42055 17.62 0.000 335.39 419.357 ------------------------------------------------------------------------------ • Model D . xtgee cd4 medvl, i(id) cor(exc) robust (standard errors adjusted for clustering on id) ------------------------------------------------------------------------------ | Semi-robust cd4 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- medvl | -98.3494 16.51035 -5.96 0.000 -130.7091 -65.98971 _cons | 377.3735 22.92943 16.46 0.000 332.4326 422.3143 ------------------------------------------------------------------------------
Association of Baseline Covariate (Age) on CD4 count • Paired T-test . keep id cd4 medvl etime . sort cd4 medvl . reshape wide cd4 etime, i(id) j(medvl) . ttest cd40= cd41 Paired t test ------------------------------------------------------------------------------ Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- cd40 | 83 377.3735 22.92943 208.897 331.7596 422.9874 cd41 | 83 279.0241 20.07767 182.9163 239.0832 318.965 ---------+-------------------------------------------------------------------- diff | 83 98.3494 16.51035 150.4164 65.50505 131.1937 ------------------------------------------------------------------------------ Ho: mean(cd40 - cd41) = mean(diff) = 0 Ha: mean(diff) < 0 Ha: mean(diff) != 0 Ha: mean(diff) > 0 t = 5.9568 t = 5.9568 t = 5.9568 P < t = 1.0000 P > |t| = 0.0000 P > t = 0.0000
Summary of Results of Association of Time Varying Covariate (VL) on CD4 count
Multiple and varying observations per person CD4 (Y) vs. continuous (log) Viral Load (X) • Model: • 2represents the expected change in Y given a change in Xij relative to the baseline value (Xi1) - longitudinal effect. • 1+ 2 represents the expected difference in average Y across two sub-populations that differ by their baseline values, Xi1 - cross-sectional effect.
Association of Time-Varying Covariate (VL) on CD4 count – multiple observations per person • Model A . xtgee cd4 logvlbase logvlchange, i(id) cor(ind) GEE population-averaged model Number of obs = 7053 Group variable: id Number of groups = 406 Link: identity Obs per group: min = 1 Family: Gaussian avg = 17.4 Correlation: independent max = 58 ------------------------------------------------------------------------------ cd4 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- logvlbase | -83.74371 2.960401 -28.29 0.000 -89.54599 -77.94143 logvlchange | -99.194 2.453052 -40.44 0.000 -104.0019 -94.3861 _cons | 618.9555 11.61598 53.28 0.000 596.1886 641.7224 ------------------------------------------------------------------------------
Association of Time-Varying Covariate (VL) on CD4 count – multiple observations per person • Model B . xtgee cd4 logvlbase logvlchange, i(id) cor(ind) robust GEE population-averaged model Number of obs = 7053 Group variable: id Number of groups = 406 Link: identity Obs per group: min = 1 Family: Gaussian avg = 17.4 Correlation: independent max = 58 Wald chi2(2) = 225.39 (standard errors adjusted for clustering on id) ------------------------------------------------------------------------------ | Semi-robust cd4 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- logvlbase | -83.74371 8.296962 -10.09 0.000 -100.0055 -67.48196 logvlchange | -99.194 6.831102 -14.52 0.000 -112.5827 -85.80528 _cons | 618.9555 35.19853 17.58 0.000 549.9677 687.9434 ------------------------------------------------------------------------------
Association of Time-Varying Covariate (VL) on CD4 count – multiple observations per person • Model C . xtgee cd4 logvlbase logvlchange, i(id) cor(exc) ------------------------------------------------------------------------------ cd4 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- logvlbase | -52.75548 7.402832 -7.13 0.000 -67.26477 -38.2462 logvlchange | -54.7488 2.172512 -25.20 0.000 -59.00684 -50.49075 _cons | 509.1174 31.23263 16.30 0.000 447.9026 570.3322 ------------------------------------------------------------------------------ • Model D (standard errors adjusted for clustering on id) ------------------------------------------------------------------------------ | Semi-robust cd4 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- logvlbase | -52.75548 7.72342 -6.83 0.000 -67.89311 -37.61786 logvlchange | -54.7488 3.158417 -17.33 0.000 -60.93918 -48.55841 _cons | 509.1174 32.95307 15.45 0.000 444.5305 573.7042 ------------------------------------------------------------------------------
Summary of Results of Association of Time-Varying Covariate (VL) on CD4 count – multiple observations per person
Revisiting Water Intervention Trial • Subjects randomized to either device which filters out pathogens or a similar looking placebo devices. • Subjects record daily whether or not they have a gastro-intestinal episode (yes/no) • Purpose is to determine the amount of GI illness attributable to drinking water.
What the data look like id date hcgi group 1. A7283 14780 . 6 2. A7283 14781 0 6 3. A7283 14782 0 6 4. A7283 14783 0 6 5. A7283 14784 0 6 6. A7283 14785 0 6 7. A7283 14786 0 6 17. A7283 14796 0 6 225. C1632 14738 . 7 226. C1632 14739 . 7 227. C1632 14740 . 7 228. C1632 14741 0 7 229. C1632 14742 0 7 230. C1632 14743 0 7 231. C1632 14744 1 7 232. C1632 14745 0 7 233. C1632 14746 0 7 234. C1632 14747 0 7 235. C1632 14748 0 7 237. C1632 14750 0 7 238. C1632 14751 1 7
Originally, reduced data to count • Sum up the number of episodes to make an overall count. • In the water trial example, calculate the number of GI episodes per person. • In notation, if Yij is the jth measurement on the ith person and Yij = 0 (no) or 1 (yes), then make a new variable Yi, • Note, we will ultimately allow for different number of time intervals (ni) among subjects.
After reduction id hcgi daysatrisk group 1. A7283 0 111 6 2. C1632 3 89 7 3. C2412 3 7 7 4. C2515 5 29 7 5. C2771 1 104 6 6. C4722 0 112 6 7. D1959 2 79 7 8. D3531 0 111 6 9. E1000 2 11 6 10. E8776 0 112 6 11. F4246 0 110 7 12. G3700 0 112 7 13. G4393 1 103 6 14. H1438 0 112 6 15. H1961 3 85 7 16. H6003 1 106 7 17. H6995 0 112 7
Originally, we fit a negative binomial regression • Consider the same covariate (X=0,1 - placebo, active) and the same underlying model: • We can fit this same model (for the mean) using MLE and the negative binomial distribution.
Example 2 - Negatvie Binomial Regression in STATA . nbreg hcgieps group2, exposure(hcgiyrs) Negative binomial regression Number of obs = 45 LR chi2(1) = 0.01 Prob > chi2 = 0.9133 Log likelihood = -90.86402 Pseudo R2 = 0.0001 ------------------------------------------------------------------------------ hcgieps | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- b1 group2 | -.0784505 .7213801 -0.11 0.913 -1.49233 1.335429 b0 _cons | 3.009467 .5530023 5.44 0.000 1.925603 4.093332 hcgiyrs | (exposure) -------------+---------------------------------------------------------------- -ln(n)lnalpha | 1.333043 .2862476 .772008 1.894078 -------------+---------------------------------------------------------------- 1/n alpha | 3.792566 1.085613 2.164107 6.646416 ------------------------------------------------------------------------------ Likelihood ratio test of alpha=0: chibar2(01) = 96.20 Prob>=chibar2 = 0.000 . lincom group2, irr ( 1) [hcgieps]group2 = 0.0 ------------------------------------------------------------------------------ hcgieps | IRR Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | .9245478 .6669505 -0.11 0.913 .2248482 3.801625 ------------------------------------------------------------------------------
Re-do keeping data in original form using GEE • In this case, for every subject, at every day at risk, there is a binary variable (yes=1 or no=0). • Thus, the data is a natural candidate for logistic regression. • Again, we have repeated measures on the individual, so we want to account for residual correlation. • Use GEE!
Association of Water Tx and HCGI • Binary Tx (Xij) = 0 (yes) or 1 (no) • Fit simple logistic model: • Compare results of Models A-D
GEE – Water Trial • Model A . logit hcgi group2 ------------------------------------------------------------------------------ hcgi | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- group2 | .0265609 .0896848 0.30 0.767 -.1492181 .2023399 _cons | -1.980865 .0670658 -29.54 0.000 -2.112312 -1.849419 ------------------------------------------------------------------------------ • Model B xtgee hcgi group2, family(binomial) i(id2) corr(ind) robust GEE population-averaged model Number of obs = 4682 Group variable: id2 Number of groups = 45 Link: logit Obs per group: min = 7 Family: binomial avg = 104.0 Correlation: independent max = 112 (standard errors adjusted for clustering on id2) ------------------------------------------------------------------------------ | Semi-robust hcgi | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- group2 | .0265609 .6328783 0.04 0.967 -1.213858 1.26698 _cons | -1.980865 .4661594 -4.25 0.000 -2.894521 -1.06721 ------------------------------------------------------------------------------
GEE – Water Trial • Model C . xtgee hcgi group2, family(binomial) i(id2) corr(exc) ------------------------------------------------------------------------------ hcgi | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- group2 | -.0568574 .6096202 -0.09 0.926 -1.251691 1.137976 _cons | -1.928638 .4417612 -4.37 0.000 -2.794474 -1.062802 ------------------------------------------------------------------------------ • Model D . xtgee hcgi group2, family(binomial) i(id2) corr(exc) robust (standard errors adjusted for clustering on id2) ------------------------------------------------------------------------------ | Semi-robust hcgi | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- group2 | -.0568574 .5994284 -0.09 0.924 -1.231716 1.118001 _cons | -1.928638 .4211176 -4.58 0.000 -2.754013 -1.103263 ------------------------------------------------------------------------------
GEE – Water Trial • Model D . lincom group2, or ( 1) group2 = 0.0 ------------------------------------------------------------------------------ hcgi | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | .9447288 .5662973 -0.09 0.924 .2917916 3.058733 ------------------------------------------------------------------------------ . xtcorr Estimated within-id2 correlation matrix R: c1 c2 c3 c4 c5 c6 c7 c8 c9 r1 1.0000 r2 0.4443 1.0000 r3 0.4443 0.4443 1.0000 r4 0.4443 0.4443 0.4443 1.0000 r5 0.4443 0.4443 0.4443 0.4443 1.0000 r6 0.4443 0.4443 0.4443 0.4443 0.4443 1.0000 r7 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 1.0000 r8 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 1.0000 r9 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 1.0000 r10 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 r11 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 r12 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 r13 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 r14 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 r15 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 r16 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 r17 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 r18 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443 0.4443
One little twist – randomization failed • Below is a 2x2 table of HCGI symptoms at baseline (before study starts) and random tx assignment. | hcgibase group2 | 0 1 | Total -----------+----------------------+---------- active | 8 13 | 21 | 38.10 61.90 | 100.00 -----------+----------------------+---------- placebo | 17 7 | 24 | 70.83 29.17 | 100.00 -----------+----------------------+---------- Total | 25 20 | 45 | 55.56 44.44 | 100.00 Pearson chi2(1) = 4.8616 Pr = 0.027 • Try adjusting for baseline HCGI
Water Trial – adjusting for baseline . xtgee hcgi group2 hcgibase, family(binomial) i(id2) corr(exc) robust (standard errors adjusted for clustering on id2) ------------------------------------------------------------------------------ | Semi-robust hcgi | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- group2 | .6922867 .630424 1.10 0.272 -.5433217 1.927895 hcgibase | 2.454423 .6137046 4.00 0.000 1.251584 3.657262 _cons | -3.902329 .6923958 -5.64 0.000 -5.2594 -2.545258 ------------------------------------------------------------------------------ . lincom group2, or ( 1) group2 = 0.0 ------------------------------------------------------------------------------ hcgi | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | 1.99828 1.259764 1.10 0.272 .5808157 6.875024 ------------------------------------------------------------------------------ . lincom hcgibase, or ( 1) hcgibase = 0.0 ------------------------------------------------------------------------------ hcgi | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | 11.63972 7.143349 4.00 0.000 3.495877 38.75509 ------------------------------------------------------------------------------
What about estimates of risk difference that account for correlation? • Typically, if we have a simple 2x2 table with cross-sectional data, can get risk difference using standard analyses. • However, with longitudinal/correlated data, must adjust inference for residual correlation. • Can do so with GEE using family binomial and link(id).
Linear risk model with binomial errors . xtgee hcgi group2 hcgibase, family(binomial) i(id2) corr(exc) robust link(id) (standard errors adjusted for clustering on id2) ------------------------------------------------------------------------------ | Semi-robust hcgi | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- a2 group2 | .0169322 .0314532 0.54 0.590 -.044715 .0785794 a1 hcgibase | .2048344 .0647286 3.16 0.002 .0779688 .3317001 a0 _cons | .0235263 .0262296 0.90 0.370 -.0278828 .0749355 ------------------------------------------------------------------------------ . xtgee hcgi group2 hcgibase, family(binomial) i(id2) corr(exc) robust (standard errors adjusted for clustering on id2) ------------------------------------------------------------------------------ | Semi-robust hcgi | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- b2 group2 | .6922867 .630424 1.10 0.272 -.5433217 1.927895 b1 hcgibase | 2.454423 .6137046 4.00 0.000 1.251584 3.657262 b0 _cons | -3.902329 .6923958 -5.64 0.000 -5.2594 -2.545258 ------------------------------------------------------------------------------