1 / 37

Nicky Best Department of Epidemiology and Public Health Imperial College, London n.bestimperial.ac.uk

BIAS Project. Talk OutlineCommon biases in observational dataGraphical modelsCase Study: combining multiple data sources to study effects of water disinfection by-products on risk of low birth weight . Bayesian methods for integrated bias modelling and analysis of multiple data sources"www.bias

reed
Download Presentation

Nicky Best Department of Epidemiology and Public Health Imperial College, London n.bestimperial.ac.uk

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    2. BIAS Project Talk Outline Common biases in observational data Graphical models Case Study: combining multiple data sources to study effects of water disinfection by-products on risk of low birth weight

    3. Biases in observational data Random errors (sampling variation) Missing data Unmeasured confounders Selection biases Measurement errors Multiple data sources often necessary to identify the biases and inform about different aspects of the research question

    4. Simple example of graphical model

    5. Simple example of graphical model C = genotype of child Once the couple have a child and become parents, their genotypes become associated through the child – e.g. paternity testing

    6. Conditional independence provides mathematical basis for expressing large system as fusion of smaller components Building complex models

    7. Conditional independence provides mathematical basis for expressing large system as fusion of smaller components Building complex models

    8. Building complex models Key idea understand complex system through global model built from small pieces comprehensible each with only a few variables modular Present context: each ‘piece’ could represent separate data source

    10. Low birth-weight and chlorine byproducts Does exposure to chlorine byproducts (i.e. total trihalomethanes (THMs) ) during pregnancy increase the risk of low birth-weight baby? Combine datasets with different strengths: Survey data (Millennium Cohort Study) Small, great individual detail. Administrative data (national births register) Large, but little individual detail. Single underlying model assumed to govern both datasets: elaborate as appropriate to handle biases

    11. Low birth-weight Important determinant of future health ? population health indicator Low birth-weight needs to be stratified by gestational age Full-term: low birth-weight babies born >= 37 weeks Pre-term: low birth-weight babies born < 37 weeks Established risk factors: Mothers’ tobacco smoking status during pregnancy. Mothers’ ethnicity (South Asian), maternal age, weight, height, number of previous births. Babies’ sex Role of environmental risk factors, such as THMs, less clear (inconclusive). Some recent studies suggest a link, but others do not.

    12. Data sources (1): Millennium Cohort Study About 11,695 births in the England between Sep 2000 and August 2001 About 1,333 singleton births when restricted to the United Utility (UU) water company UU company is located in northwest part of England. Postcode made available to us under strict security arrangements Match individuals with exposure to chlorine byproducts estimated in separate model (Whitaker et al, 2005) Birth weight, baby’s gestation age and reasonably complete set of confounder data available Allows a reasonable analysis, but issues remain: Low power to detect small effect ? could be improved by incorporating other data. Potential selection bias…

    13. Data sources (2): National birth register (NBR) Every birth in the population recorded. Individual data with postcode (? THM exposure) and birth weight available to us under strict security. We study subjects from wards which were covered by the UU water company and which are present in both MCS and NBR samples: 7945 singleton births between Sep 2000 and Aug 2001. Larger dataset, no selection bias …but no confounder information, especially ethnicity and smoking. No record of gestation age.

    14. Data sources (3): Aggregate data Ethnic composition of the population 2001 census for census output areas (~500 individuals) Tobacco expenditure consumer surveys (CACI, who produce ACORN consumer classification data) for census output areas. …linked by postcode to Millennium Cohort and national register data.

    16. Models for formally analysing combined data Want estimate of the association between low birth-weight (full-term and pre-term) and THM exposure, using all data, accounting for: Selection bias in MCS Adjust models for predictors of selection Missing confounders in register Bayesian graphical model… Missing outcomes in register data: no gestation age information to stratify the birth weight Bayesian graphical model…

    17. Graphical model representation

    18. Graphical model representation

    19. Missing confounder imputation model

    20. Combining models

    28. Real data analysis – United Utilities water company

    29. Real data analysis – United Utilities water company

    31. Results for the real data analysis (Low birth-weight full-term VS Normal)

    32. Conclusion Evidence for association between THM exposure and low birth-weight full-term (but not with pre-term LBW) Combining the datasets can increase statistical power of the survey data alleviate bias due to unmeasured confounding in the administrative data Benefits of combining data via graphical model will depend on amount of information and strength of association provided by each sub-model Must allow for selection mechanism of survey when combining data, and check compatibility of data sources

    33. THANKS Jassy Molitor Sylvia Richardson Chris Jackson

    36. two-levels VS one-level imputation

More Related