1 / 50

Modeling Multiple Source Risk Factor Data and Health Outcomes in Twins

Modeling Multiple Source Risk Factor Data and Health Outcomes in Twins. Andy Bogart, MS Jack Goldberg, PhD. Multiple Informant Data. Military Service in Vietnam. Command. regress ptsd sr, robust. Self Report sr | .1793066 .0070909.

tessa
Download Presentation

Modeling Multiple Source Risk Factor Data and Health Outcomes in Twins

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modeling Multiple Source Risk Factor Data and Health Outcomes in Twins Andy Bogart, MS Jack Goldberg, PhD

  2. Multiple Informant Data Military Service in Vietnam

  3. Command regress ptsd sr, robust Self Report sr | .1793066 .0070909 Linear regression Number of obs = 10796 F( 1, 10794) = 639.43 Prob > F = 0.0000 R-squared = 0.0599 Root MSE = .34613 ------------------------------------------------------------------------------ | Robust ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- sr | .1793066 .0070909 25.29 0.000 .1654071 .193206 _cons | 3.130085 .0039722 788.00 0.000 3.122299 3.137871 ------------------------------------------------------------------------------

  4. Command regress ptsd mr, robust Self Report sr | .1793066 .0070909 Military Record mr | .152672 .0072727 Linear regression Number of obs = 10712 F( 1, 10710) = 440.68 Prob > F = 0.0000 R-squared = 0.0423 Root MSE = .34992 ------------------------------------------------------------------------------ | Robust ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mr | .152672 .0072727 20.99 0.000 .138416 .1669279 _cons | 3.144166 .0040245 781.26 0.000 3.136277 3.152054 ------------------------------------------------------------------------------ Linear regression Number of obs = 10712 F( 1, 10710) = 440.68 Prob > F = 0.0000 R-squared = 0.0423 Root MSE = .34992 ------------------------------------------------------------------------------ | Robust ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mr | .152672 .0072727 20.99 0.000 .138416 .1669279 _cons | 3.144166 .0040245 781.26 0.000 3.136277 3.152054 ------------------------------------------------------------------------------

  5. expected outcome source indicators source by exposure interaction terms intercept Allows testing for a difference in sources Model 1: The General Multiple Source Model Generates same estimates as the k marginal source-specific models

  6. Multiple Informant Data

  7. Command expand 2

  8. Command expand 2

  9. Command generate service=0

  10. Command by id: replace service = sr if _n==1

  11. Command by id: replace service = mr if _n==2

  12. Command

  13. Command generate s1 = 0 generate s2 = 0

  14. Command by id: replace s1 = 1 if _n==1 by id: replace s2 = 1 if _n==2

  15. Command generate z1 = service * s1 generate z2 = service * s2

  16. Command xtgee ptsd s1 z1 z2, i(pin) corr(ind) family(gau) robust Self Report sr | .1793066 .0070909 Military Record mr | .152672 .0072727 Iteration 1: tolerance = 7.894e-14 GEE population-averaged model Number of obs = 21508 Group variable: pin Number of groups = 10809 Link: identity Obs per group: min = 1 Family: Gaussian avg = 2.0 Correlation: independent max = 2 Wald chi2(3) = 640.25 Scale parameter: .1210952 Prob > chi2 = 0.0000 Pearson chi2(21508): 2604.52 Deviance = 2604.52 Dispersion (Pearson): .1210952 Dispersion = .1210952 (Std. Err. adjusted for clustering on pin) ------------------------------------------------------------------------------ | Semi-robust ptsd | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807 .0016444 -8.56 0.000 -.0173037 -.0108576 z1 | .1793066 .0070906 25.29 0.000 .1654093 .1932038 z2 | .152672 .0072724 20.99 0.000 .1384183 .1669256 _cons | 3.144166 .0040243 781.30 0.000 3.136278 3.152053 ------------------------------------------------------------------------------ Iteration 1: tolerance = 7.894e-14 GEE population-averaged model Number of obs = 21508 Group variable: pin Number of groups = 10809 Link: identity Obs per group: min = 1 Family: Gaussian avg = 2.0 Correlation: independent max = 2 Wald chi2(3) = 640.25 Scale parameter: .1210952 Prob > chi2 = 0.0000 Pearson chi2(21508): 2604.52 Deviance = 2604.52 Dispersion (Pearson): .1210952 Dispersion = .1210952 (Std. Err. adjusted for clustering on pin) ------------------------------------------------------------------------------ | Semi-robust ptsd | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807 .0016444 -8.56 0.000 -.0173037 -.0108576 z1 | .1793066 .0070906 25.29 0.000 .1654093 .1932038 z2 | .152672 .0072724 20.99 0.000 .1384183 .1669256 _cons | 3.144166 .0040243 781.30 0.000 3.136278 3.152053 ------------------------------------------------------------------------------

  17. But wait . . . these guys are twins! Data within twin pairs might be correlated . . .

  18. Command svyset id [pweight = sampweight], strata(pairid) pweight: sampweight VCE: linearized Strata 1: pairid SU 1: id FPC 1: <zero> pweight: sampweight VCE: linearized Strata 1: pairid SU 1: id FPC 1: <zero>

  19. Command svy: regress ptsd s1 z1 z2 Self Report sr | .1793066 .0070909 Military Record mr | .152672 .0072727 Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 3, 6170) = 230.51 Prob > F = 0.0000 R-squared = 0.0511 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807 .001642 -8.58 0.000 -.0172995 -.0108619 z1 | .1793066 .006818 26.30 0.000 .1659408 .1926723 z2 | .152672 .0069024 22.12 0.000 .1391409 .166203 _cons | 3.144166 .0035541 884.66 0.000 3.137198 3.151133 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 3, 6170) = 230.51 Prob > F = 0.0000 R-squared = 0.0511 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807 .001642 -8.58 0.000 -.0172995 -.0108619 z1 | .1793066 .006818 26.30 0.000 .1659408 .1926723 z2 | .152672 .0069024 22.12 0.000 .1391409 .166203 _cons | 3.144166 .0035541 884.66 0.000 3.137198 3.151133 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 3, 6170) = 230.51 Prob > F = 0.0000 R-squared = 0.0511 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807 .001642 -8.58 0.000 -.0172995 -.0108619 z1 | .1793066 .006818 26.30 0.000 .1659408 .1926723 z2 | .152672 .0069024 22.12 0.000 .1391409 .166203 _cons | 3.144166 .0035541 884.66 0.000 3.137198 3.151133 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members

  20. Command test z1 = z2 Self Report sr | .1793066 .00618 Military Record mr | .152672 .0069024 . test z1 = z2 ( 1) z1 - z2 = 0 chi2( 1) = 44.89 Prob > chi2 = 0.0000 . test z1 = z2 Adjusted Wald test ( 1) z1 - z2 = 0 chi2( 1) = 45.66 Prob > chi2 = 0.0000 Moral of the story: The two sources contain different information. We should not combine them. Or, should we??

  21. source indicators source by between-pair effect interaction terms source by within-pair effect interaction terms intercept Allows testing for a difference in reports of within effects & between effects Model 2: Multiple Source Model of Within- and Between-pair exposure effects Same estimates as k separate marginal within & between models

  22. Command

  23. Command bysort pairid: egen z1bar = mean(z1) if s1==1

  24. Command bysort pairid: egen z1bar = mean(z1) if s1==1 bysort pairid: replace z1bar=0 if s1==0

  25. Command bysort pairid: egen z1bar = mean(z1) if s1==1 bysort pairid: replace z1bar=0 if s1==0

  26. Command bysort pairid: egen z1bar = mean(z1) if s1==1 bysort pairid: replace z1bar=0 if s1==0

  27. Command bysort pairid: egen z1bar = mean(z1) if s1==1 bysort pairid: replace z1bar=0 if s1==0 generate z1diff = z1 – z1bar

  28. Command (Repeat that procedure to make z2bar and z2diff)

  29. Command svy: regress ptsd s1 z1diff z1bar z2diff z2bar Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 3, 6170) = 230.51 Prob > F = 0.0000 R-squared = 0.0511 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807 .001642 -8.58 0.000 -.0172995 -.0108619 z1 | .1793066 .006818 26.30 0.000 .1659408 .1926723 z2 | .152672 .0069024 22.12 0.000 .1391409 .166203 _cons | 3.144166 .0035541 884.66 0.000 3.137198 3.151133 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 5, 6168) = 154.41 Prob > F = 0.0000 R-squared = 0.0512 ------------------------------------------------------------------------------ | Linearized ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0182144 .0016726 -10.89 0.000 -.0214933 -.0149355 z1diff | .1669005 .0134838 12.38 0.000 .1404675 .1933335 z1bar | .1857651 .0074393 24.97 0.000 .1711816 .2003487 z2diff | .1618065 .0138901 11.65 0.000 .134577 .189036 z2bar | .1482027 .0074941 19.78 0.000 .1335116 .1628937 _cons | 3.145802 .0037693 834.58 0.000 3.138413 3.153191 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members

  30. Command svy: regress ptsd s1 z1diff z1bar z2diff z2bar Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 3, 6170) = 230.51 Prob > F = 0.0000 R-squared = 0.0511 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807 .001642 -8.58 0.000 -.0172995 -.0108619 z1 | .1793066 .006818 26.30 0.000 .1659408 .1926723 z2 | .152672 .0069024 22.12 0.000 .1391409 .166203 _cons | 3.144166 .0035541 884.66 0.000 3.137198 3.151133 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 5, 6168) = 154.41 Prob > F = 0.0000 R-squared = 0.0512 ------------------------------------------------------------------------------ | Linearized ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0182144 .0016726 -10.89 0.000 -.0214933 -.0149355 z1diff | .1669005 .0134838 12.38 0.000 .1404675 .1933335 z1bar | .1857651 .0074393 24.97 0.000 .1711816 .2003487 z2diff | .1618065 .0138901 11.65 0.000 .134577 .189036 z2bar | .1482027 .0074941 19.78 0.000 .1335116 .1628937 _cons | 3.145802 .0037693 834.58 0.000 3.138413 3.153191 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members

  31. Command svy: regress ptsd s1 z1diff z1bar z2diff z2bar Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 3, 6170) = 230.51 Prob > F = 0.0000 R-squared = 0.0511 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807 .001642 -8.58 0.000 -.0172995 -.0108619 z1 | .1793066 .006818 26.30 0.000 .1659408 .1926723 z2 | .152672 .0069024 22.12 0.000 .1391409 .166203 _cons | 3.144166 .0035541 884.66 0.000 3.137198 3.151133 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 5, 6168) = 154.41 Prob > F = 0.0000 R-squared = 0.0512 ------------------------------------------------------------------------------ | Linearized ptsd | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0182144 .0016726 -10.89 0.000 -.0214933 -.0149355 z1diff | .1669005 .0134838 12.38 0.000 .1404675 .1933335 z1bar | .1857651 .0074393 24.97 0.000 .1711816 .2003487 z2diff | .1618065 .0138901 11.65 0.000 .134577 .189036 z2bar | .1482027 .0074941 19.78 0.000 .1335116 .1628937 _cons | 3.145802 .0037693 834.58 0.000 3.138413 3.153191 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members

  32. Command test z1diff = z2diff Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 3, 6170) = 230.51 Prob > F = 0.0000 R-squared = 0.0511 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807 .001642 -8.58 0.000 -.0172995 -.0108619 z1 | .1793066 .006818 26.30 0.000 .1659408 .1926723 z2 | .152672 .0069024 22.12 0.000 .1391409 .166203 _cons | 3.144166 .0035541 884.66 0.000 3.137198 3.151133 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members Adjusted Wald test ( 1) z1diff - z2diff = 0 F( 1, 6172) = 0.36 Prob > F = 0.5509 | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0182144 .0016726 -10.89 0.000 -.0214933 -.0149355 z1diff | .1669005 .0134838 12.38 0.000 .1404675 .1933335 z1bar | .1857651 .0074393 24.97 0.000 .1711816 .2003487 z2diff | .1618065 .0138901 11.65 0.000 .134577 .189036 z2bar | .1482027 .0074941 19.78 0.000 .1335116 .1628937 _cons | 3.145802 .0037693 834.58 0.000 3.138413 3.153191 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members

  33. Command test z1diff = z2diff test z1bar = z2bar Adjusted Wald test ( 1) z1diff - z2diff = 0 F( 1, 6172) = 0.36 Prob > F = 0.5509 | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0182144 .0016726 -10.89 0.000 -.0214933 -.0149355 z1diff | .1669005 .0134838 12.38 0.000 .1404675 .1933335 z1bar | .1857651 .0074393 24.97 0.000 .1711816 .2003487 z2diff | .1618065 .0138901 11.65 0.000 .134577 .189036 z2bar | .1482027 .0074941 19.78 0.000 .1335116 .1628937 _cons | 3.145802 .0037693 834.58 0.000 3.138413 3.153191 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members Adjusted Wald test ( 1) z1diff - z2diff = 0 F( 1, 6172) = 0.36 Prob > F = 0.5509 Adjusted Wald test ( 1) z1bar - z2bar = 0 F( 1, 6172) = 83.66 Prob > F = 0.0000 Within-pair estimates don’t differ much • Moral of the story: • Combine the within-pair info. • Keep between-pair info. separate Between-pair estimates do!!

  34. source indicators source by between-pair effect interaction terms intercept combined sourcewithin-pair effect Model 3: Multiple Source Model with a Combined within-pair effect Assumes within-pair effect to be common to all k sources Often yields a more precise estimate of the within-pair effect

  35. Command

  36. Command generate wservice = z1diff + z2diff

  37. Command svy: regress ptsd s1 wservice z1bar z2bar Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 4, 6169) = 192.48 Prob > F = 0.0000 R-squared = 0.0512 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0182138 .0016722 -10.89 0.000 -.0214919 -.0149358 wservice | .1644434 .0129988 12.65 0.000 .1389611 .1899256 z1bar | .1857654 .0074392 24.97 0.000 .1711819 .2003489 z2bar | .1482022 .0074941 19.78 0.000 .1335111 .1628933 _cons | 3.145802 .0037693 834.59 0.000 3.138412 3.153191 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 3, 6170) = 230.51 Prob > F = 0.0000 R-squared = 0.0511 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807 .001642 -8.58 0.000 -.0172995 -.0108619 z1 | .1793066 .006818 26.30 0.000 .1659408 .1926723 z2 | .152672 .0069024 22.12 0.000 .1391409 .166203 _cons | 3.144166 .0035541 884.66 0.000 3.137198 3.151133 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members

  38. Command svy: regress ptsd s1 wservice z1bar z2bar Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 4, 6169) = 192.48 Prob > F = 0.0000 R-squared = 0.0512 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0182138 .0016722 -10.89 0.000 -.0214919 -.0149358 wservice | .1644434 .0129988 12.65 0.000 .1389611 .1899256 z1bar | .1857654 .0074392 24.97 0.000 .1711819 .2003489 z2bar | .1482022 .0074941 19.78 0.000 .1335111 .1628933 _cons | 3.145802 .0037693 834.59 0.000 3.138412 3.153191 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members Survey: Linear regression Number of strata = 6172 Number of obs = 24557 Number of PSUs = 12344 Population size = 21508 Design df = 6172 F( 3, 6170) = 230.51 Prob > F = 0.0000 R-squared = 0.0511 ------------------------------------------------------------------------------ | Linearized logptsd2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- s1 | -.0140807 .001642 -8.58 0.000 -.0172995 -.0108619 z1 | .1793066 .006818 26.30 0.000 .1659408 .1926723 z2 | .152672 .0069024 22.12 0.000 .1391409 .166203 _cons | 3.144166 .0035541 884.66 0.000 3.137198 3.151133 ------------------------------------------------------------------------------ Note: 35 strata omitted because they contain no population members

  39. Conclusions from VET Registry analysis Sources differed in Model 1, so we did not combine them overall Within-pair estimates in Model 2 did not differ much by source, so . . . Model 3 combined within-pair estimates Within-pair estimate: Combined Record 0.16 (0.14, 0.19) 7 – 14% gain in efficiency over individual sources

  40. Conclusions from VET Registry analysis Between-pair estimates in Model 2 differed significantly Model 3 estimates separate between-pair effects for each source Source-specific between-pair estimates: Self Report 0.19 (0.17, 0.20) Military Record 0.15 (0.13, 0.16)

  41. Future Directions Accommodate covariate adjustment Compare pooled estimators to “AND” and “OR” type derived exposure variables Address zygosity within regression models

  42. Acknowledgements & References Jack Goldberg at UW Margaret Pepe at UW • Pepe MS, Whitaker RC, Seidel K. Estimating and comparing univariate associations with application to the prediction of adult obesity. Statistics in Medicine 1999; 18: 163-173. Nicholas Horton at Harvard • Horton NJ, Fitzmaurice GM. Regression analysis of multiple source and multiple informant data from complex survey samples. Statistics in Medicine 2004; 23:2911-2933.

  43. Thank you for listening

More Related