500 likes | 635 Views
Modeling Multiple Source Risk Factor Data and Health Outcomes in Twins. Andy Bogart, MS Jack Goldberg, PhD. Multiple Informant Data. Military Service in Vietnam. Command. regress ptsd sr, robust. Self Report sr | .1793066 .0070909.
E N D
Modeling Multiple Source Risk Factor Data and Health Outcomes in Twins Andy Bogart, MS Jack Goldberg, PhD
Multiple Informant Data Military Service in Vietnam
Command regress ptsd sr, robust Self Report sr | .1793066 .0070909 Linear regression Number of obs = 10796 F( 1, 10794) = 639.43 Prob > F = 0.0000 R-squared = 0.0599 Root MSE = .34613 ------------------------------------------------------------------------------ | Robust ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- sr | .1793066 .0070909 25.29 0.000 .1654071 .193206 _cons | 3.130085 .0039722 788.00 0.000 3.122299 3.137871 ------------------------------------------------------------------------------
Command regress ptsd mr, robust Self Report sr | .1793066 .0070909 Military Record mr | .152672 .0072727 Linear regression Number of obs = 10712 F( 1, 10710) = 440.68 Prob > F = 0.0000 R-squared = 0.0423 Root MSE = .34992 ------------------------------------------------------------------------------ | Robust ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mr | .152672 .0072727 20.99 0.000 .138416 .1669279 _cons | 3.144166 .0040245 781.26 0.000 3.136277 3.152054 ------------------------------------------------------------------------------ Linear regression Number of obs = 10712 F( 1, 10710) = 440.68 Prob > F = 0.0000 R-squared = 0.0423 Root MSE = .34992 ------------------------------------------------------------------------------ | Robust ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mr | .152672 .0072727 20.99 0.000 .138416 .1669279 _cons | 3.144166 .0040245 781.26 0.000 3.136277 3.152054 ------------------------------------------------------------------------------
expected outcome source indicators source by exposure interaction terms intercept Allows testing for a difference in sources Model 1: The General Multiple Source Model Generates same estimates as the k marginal source-specific models
Command expand 2
Command expand 2
Command generate service=0
Command by id: replace service = sr if _n==1
Command by id: replace service = mr if _n==2
Command generate s1 = 0 generate s2 = 0
Command by id: replace s1 = 1 if _n==1 by id: replace s2 = 1 if _n==2
Command generate z1 = service * s1 generate z2 = service * s2
Command xtgee ptsd s1 z1 z2, i(pin) corr(ind) family(gau) robust Self Report sr | .1793066 .0070909 Military Record mr | .152672 .0072727 Iteration 1: tolerance = 7.894e-14 GEE population-averaged model Number of obs = 21508 Group variable: pin Number of groups = 10809 Link: identity Obs per group: min = 1 Family: Gaussian avg = 2.0 Correlation: independent max = 2 Wald chi2(3) = 640.25 Scale parameter: .1210952 Prob > chi2 = 0.0000 Pearson chi2(21508): 2604.52 Deviance = 2604.52 Dispersion (Pearson): .1210952 Dispersion = .1210952 (Std. Err. adjusted for clustering on pin) ------------------------------------------------------------------------------ | Semi-robust ptsd | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807 .0016444 -8.56 0.000 -.0173037 -.0108576 z1 | .1793066 .0070906 25.29 0.000 .1654093 .1932038 z2 | .152672 .0072724 20.99 0.000 .1384183 .1669256 _cons | 3.144166 .0040243 781.30 0.000 3.136278 3.152053 ------------------------------------------------------------------------------ Iteration 1: tolerance = 7.894e-14 GEE population-averaged model Number of obs = 21508 Group variable: pin Number of groups = 10809 Link: identity Obs per group: min = 1 Family: Gaussian avg = 2.0 Correlation: independent max = 2 Wald chi2(3) = 640.25 Scale parameter: .1210952 Prob > chi2 = 0.0000 Pearson chi2(21508): 2604.52 Deviance = 2604.52 Dispersion (Pearson): .1210952 Dispersion = .1210952 (Std. Err. adjusted for clustering on pin) ------------------------------------------------------------------------------ | Semi-robust ptsd | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807 .0016444 -8.56 0.000 -.0173037 -.0108576 z1 | .1793066 .0070906 25.29 0.000 .1654093 .1932038 z2 | .152672 .0072724 20.99 0.000 .1384183 .1669256 _cons | 3.144166 .0040243 781.30 0.000 3.136278 3.152053 ------------------------------------------------------------------------------
But wait . . . these guys are twins! Data within twin pairs might be correlated . . .
Command svyset id [pweight = sampweight], strata(pairid) pweight: sampweight VCE: linearized Strata 1: pairid SU 1: id FPC 1: <zero> pweight: sampweight VCE: linearized Strata 1: pairid SU 1: id FPC 1: <zero>
Command svy: regress ptsd s1 z1 z2 Self Report sr | .1793066 .0070909 Military Record mr | .152672 .0072727 Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 3, 6170) = 230.51 Prob > F = 0.0000 R-squared = 0.0511 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807 .001642 -8.58 0.000 -.0172995 -.0108619 z1 | .1793066 .006818 26.30 0.000 .1659408 .1926723 z2 | .152672 .0069024 22.12 0.000 .1391409 .166203 _cons | 3.144166 .0035541 884.66 0.000 3.137198 3.151133 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 3, 6170) = 230.51 Prob > F = 0.0000 R-squared = 0.0511 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807 .001642 -8.58 0.000 -.0172995 -.0108619 z1 | .1793066 .006818 26.30 0.000 .1659408 .1926723 z2 | .152672 .0069024 22.12 0.000 .1391409 .166203 _cons | 3.144166 .0035541 884.66 0.000 3.137198 3.151133 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 3, 6170) = 230.51 Prob > F = 0.0000 R-squared = 0.0511 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807 .001642 -8.58 0.000 -.0172995 -.0108619 z1 | .1793066 .006818 26.30 0.000 .1659408 .1926723 z2 | .152672 .0069024 22.12 0.000 .1391409 .166203 _cons | 3.144166 .0035541 884.66 0.000 3.137198 3.151133 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members
Command test z1 = z2 Self Report sr | .1793066 .00618 Military Record mr | .152672 .0069024 . test z1 = z2 ( 1) z1 - z2 = 0 chi2( 1) = 44.89 Prob > chi2 = 0.0000 . test z1 = z2 Adjusted Wald test ( 1) z1 - z2 = 0 chi2( 1) = 45.66 Prob > chi2 = 0.0000 Moral of the story: The two sources contain different information. We should not combine them. Or, should we??
source indicators source by between-pair effect interaction terms source by within-pair effect interaction terms intercept Allows testing for a difference in reports of within effects & between effects Model 2: Multiple Source Model of Within- and Between-pair exposure effects Same estimates as k separate marginal within & between models
Command bysort pairid: egen z1bar = mean(z1) if s1==1
Command bysort pairid: egen z1bar = mean(z1) if s1==1 bysort pairid: replace z1bar=0 if s1==0
Command bysort pairid: egen z1bar = mean(z1) if s1==1 bysort pairid: replace z1bar=0 if s1==0
Command bysort pairid: egen z1bar = mean(z1) if s1==1 bysort pairid: replace z1bar=0 if s1==0
Command bysort pairid: egen z1bar = mean(z1) if s1==1 bysort pairid: replace z1bar=0 if s1==0 generate z1diff = z1 – z1bar
Command (Repeat that procedure to make z2bar and z2diff)
Command svy: regress ptsd s1 z1diff z1bar z2diff z2bar Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 3, 6170) = 230.51 Prob > F = 0.0000 R-squared = 0.0511 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807 .001642 -8.58 0.000 -.0172995 -.0108619 z1 | .1793066 .006818 26.30 0.000 .1659408 .1926723 z2 | .152672 .0069024 22.12 0.000 .1391409 .166203 _cons | 3.144166 .0035541 884.66 0.000 3.137198 3.151133 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 5, 6168) = 154.41 Prob > F = 0.0000 R-squared = 0.0512 ------------------------------------------------------------------------------ | Linearized ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0182144 .0016726 -10.89 0.000 -.0214933 -.0149355 z1diff | .1669005 .0134838 12.38 0.000 .1404675 .1933335 z1bar | .1857651 .0074393 24.97 0.000 .1711816 .2003487 z2diff | .1618065 .0138901 11.65 0.000 .134577 .189036 z2bar | .1482027 .0074941 19.78 0.000 .1335116 .1628937 _cons | 3.145802 .0037693 834.58 0.000 3.138413 3.153191 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members
Command svy: regress ptsd s1 z1diff z1bar z2diff z2bar Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 3, 6170) = 230.51 Prob > F = 0.0000 R-squared = 0.0511 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807 .001642 -8.58 0.000 -.0172995 -.0108619 z1 | .1793066 .006818 26.30 0.000 .1659408 .1926723 z2 | .152672 .0069024 22.12 0.000 .1391409 .166203 _cons | 3.144166 .0035541 884.66 0.000 3.137198 3.151133 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 5, 6168) = 154.41 Prob > F = 0.0000 R-squared = 0.0512 ------------------------------------------------------------------------------ | Linearized ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0182144 .0016726 -10.89 0.000 -.0214933 -.0149355 z1diff | .1669005 .0134838 12.38 0.000 .1404675 .1933335 z1bar | .1857651 .0074393 24.97 0.000 .1711816 .2003487 z2diff | .1618065 .0138901 11.65 0.000 .134577 .189036 z2bar | .1482027 .0074941 19.78 0.000 .1335116 .1628937 _cons | 3.145802 .0037693 834.58 0.000 3.138413 3.153191 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members
Command svy: regress ptsd s1 z1diff z1bar z2diff z2bar Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 3, 6170) = 230.51 Prob > F = 0.0000 R-squared = 0.0511 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807 .001642 -8.58 0.000 -.0172995 -.0108619 z1 | .1793066 .006818 26.30 0.000 .1659408 .1926723 z2 | .152672 .0069024 22.12 0.000 .1391409 .166203 _cons | 3.144166 .0035541 884.66 0.000 3.137198 3.151133 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 5, 6168) = 154.41 Prob > F = 0.0000 R-squared = 0.0512 ------------------------------------------------------------------------------ | Linearized ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0182144 .0016726 -10.89 0.000 -.0214933 -.0149355 z1diff | .1669005 .0134838 12.38 0.000 .1404675 .1933335 z1bar | .1857651 .0074393 24.97 0.000 .1711816 .2003487 z2diff | .1618065 .0138901 11.65 0.000 .134577 .189036 z2bar | .1482027 .0074941 19.78 0.000 .1335116 .1628937 _cons | 3.145802 .0037693 834.58 0.000 3.138413 3.153191 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members
Command test z1diff = z2diff Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 3, 6170) = 230.51 Prob > F = 0.0000 R-squared = 0.0511 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807 .001642 -8.58 0.000 -.0172995 -.0108619 z1 | .1793066 .006818 26.30 0.000 .1659408 .1926723 z2 | .152672 .0069024 22.12 0.000 .1391409 .166203 _cons | 3.144166 .0035541 884.66 0.000 3.137198 3.151133 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members Adjusted Wald test ( 1) z1diff - z2diff = 0 F( 1, 6172) = 0.36 Prob > F = 0.5509 | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0182144 .0016726 -10.89 0.000 -.0214933 -.0149355 z1diff | .1669005 .0134838 12.38 0.000 .1404675 .1933335 z1bar | .1857651 .0074393 24.97 0.000 .1711816 .2003487 z2diff | .1618065 .0138901 11.65 0.000 .134577 .189036 z2bar | .1482027 .0074941 19.78 0.000 .1335116 .1628937 _cons | 3.145802 .0037693 834.58 0.000 3.138413 3.153191 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members
Command test z1diff = z2diff test z1bar = z2bar Adjusted Wald test ( 1) z1diff - z2diff = 0 F( 1, 6172) = 0.36 Prob > F = 0.5509 | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0182144 .0016726 -10.89 0.000 -.0214933 -.0149355 z1diff | .1669005 .0134838 12.38 0.000 .1404675 .1933335 z1bar | .1857651 .0074393 24.97 0.000 .1711816 .2003487 z2diff | .1618065 .0138901 11.65 0.000 .134577 .189036 z2bar | .1482027 .0074941 19.78 0.000 .1335116 .1628937 _cons | 3.145802 .0037693 834.58 0.000 3.138413 3.153191 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members Adjusted Wald test ( 1) z1diff - z2diff = 0 F( 1, 6172) = 0.36 Prob > F = 0.5509 Adjusted Wald test ( 1) z1bar - z2bar = 0 F( 1, 6172) = 83.66 Prob > F = 0.0000 Within-pair estimates don’t differ much • Moral of the story: • Combine the within-pair info. • Keep between-pair info. separate Between-pair estimates do!!
source indicators source by between-pair effect interaction terms intercept combined sourcewithin-pair effect Model 3: Multiple Source Model with a Combined within-pair effect Assumes within-pair effect to be common to all k sources Often yields a more precise estimate of the within-pair effect
Command generate wservice = z1diff + z2diff
Command svy: regress ptsd s1 wservice z1bar z2bar Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 4, 6169) = 192.48 Prob > F = 0.0000 R-squared = 0.0512 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0182138 .0016722 -10.89 0.000 -.0214919 -.0149358 wservice | .1644434 .0129988 12.65 0.000 .1389611 .1899256 z1bar | .1857654 .0074392 24.97 0.000 .1711819 .2003489 z2bar | .1482022 .0074941 19.78 0.000 .1335111 .1628933 _cons | 3.145802 .0037693 834.59 0.000 3.138412 3.153191 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 3, 6170) = 230.51 Prob > F = 0.0000 R-squared = 0.0511 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807 .001642 -8.58 0.000 -.0172995 -.0108619 z1 | .1793066 .006818 26.30 0.000 .1659408 .1926723 z2 | .152672 .0069024 22.12 0.000 .1391409 .166203 _cons | 3.144166 .0035541 884.66 0.000 3.137198 3.151133 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members
Command svy: regress ptsd s1 wservice z1bar z2bar Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 4, 6169) = 192.48 Prob > F = 0.0000 R-squared = 0.0512 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0182138 .0016722 -10.89 0.000 -.0214919 -.0149358 wservice | .1644434 .0129988 12.65 0.000 .1389611 .1899256 z1bar | .1857654 .0074392 24.97 0.000 .1711819 .2003489 z2bar | .1482022 .0074941 19.78 0.000 .1335111 .1628933 _cons | 3.145802 .0037693 834.59 0.000 3.138412 3.153191 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 3, 6170) = 230.51 Prob > F = 0.0000 R-squared = 0.0511 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807 .001642 -8.58 0.000 -.0172995 -.0108619 z1 | .1793066 .006818 26.30 0.000 .1659408 .1926723 z2 | .152672 .0069024 22.12 0.000 .1391409 .166203 _cons | 3.144166 .0035541 884.66 0.000 3.137198 3.151133 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members
Conclusions from VET Registry analysis Sources differed in Model 1, so we did not combine them overall Within-pair estimates in Model 2 did not differ much by source, so . . . Model 3 combined within-pair estimates Within-pair estimate: Combined Record 0.16 (0.14, 0.19) 7 – 14% gain in efficiency over individual sources
Conclusions from VET Registry analysis Between-pair estimates in Model 2 differed significantly Model 3 estimates separate between-pair effects for each source Source-specific between-pair estimates: Self Report 0.19 (0.17, 0.20) Military Record 0.15 (0.13, 0.16)
Future Directions Accommodate covariate adjustment Compare pooled estimators to “AND” and “OR” type derived exposure variables Address zygosity within regression models
Acknowledgements & References Jack Goldberg at UW Margaret Pepe at UW • Pepe MS, Whitaker RC, Seidel K. Estimating and comparing univariate associations with application to the prediction of adult obesity. Statistics in Medicine 1999; 18: 163-173. Nicholas Horton at Harvard • Horton NJ, Fitzmaurice GM. Regression analysis of multiple source and multiple informant data from complex survey samples. Statistics in Medicine 2004; 23:2911-2933.