370 likes | 631 Views
BIAS Project. Talk OutlineCommon biases in observational dataGraphical modelsCase Study: combining multiple data sources to study effects of water disinfection by-products on risk of low birth weight . Bayesian methods for integrated bias modelling and analysis of multiple data sources"www.bias
E N D
2. BIAS Project Talk Outline
Common biases in observational data
Graphical models
Case Study: combining multiple data sources to study effects of water disinfection by-products on risk of low birth weight
3. Biases in observational data Random errors (sampling variation)
Missing data
Unmeasured confounders
Selection biases
Measurement errors
Multiple data sources often necessary to identify the biases and inform about different aspects of the research question
4. Simple example of graphical model
5. Simple example of graphical model C = genotype of child
Once the couple have a child and become parents, their genotypes become associated through the child – e.g. paternity testing
6. Conditional independence provides mathematical basis for expressing large system as fusion of smaller components Building complex models
7. Conditional independence provides mathematical basis for expressing large system as fusion of smaller components Building complex models
8. Building complex models Key idea
understand complex system
through global model
built from small pieces
comprehensible
each with only a few variables
modular
Present context: each ‘piece’ could represent separate data source
10. Low birth-weight and chlorine byproducts Does exposure to chlorine byproducts (i.e. total trihalomethanes (THMs) ) during pregnancy increase the risk of low birth-weight baby?
Combine datasets with different strengths:
Survey data (Millennium Cohort Study)
Small, great individual detail.
Administrative data (national births register)
Large, but little individual detail.
Single underlying model assumed to govern both datasets: elaborate as appropriate to handle biases
11. Low birth-weight Important determinant of future health ? population health indicator
Low birth-weight needs to be stratified by gestational age
Full-term: low birth-weight babies born >= 37 weeks
Pre-term: low birth-weight babies born < 37 weeks
Established risk factors:
Mothers’ tobacco smoking status during pregnancy.
Mothers’ ethnicity (South Asian), maternal age, weight, height, number of previous births.
Babies’ sex
Role of environmental risk factors, such as THMs, less clear (inconclusive).
Some recent studies suggest a link, but others do not.
12. Data sources (1): Millennium Cohort Study About 11,695 births in the England between Sep 2000 and August 2001
About 1,333 singleton births when restricted to the United Utility (UU) water company
UU company is located in northwest part of England.
Postcode made available to us under strict security arrangements
Match individuals with exposure to chlorine byproducts estimated in separate model (Whitaker et al, 2005)
Birth weight, baby’s gestation age and reasonably complete set of confounder data available
Allows a reasonable analysis, but issues remain:
Low power to detect small effect ? could be improved by incorporating other data.
Potential selection bias…
13. Data sources (2): National birth register (NBR) Every birth in the population recorded.
Individual data with postcode (? THM exposure) and birth weight available to us under strict security.
We study subjects from wards which were covered by the UU water company and which are present in both MCS and NBR samples: 7945 singleton births between Sep 2000 and Aug 2001.
Larger dataset, no selection bias
…but no confounder information, especially ethnicity and smoking.
No record of gestation age.
14. Data sources (3): Aggregate data Ethnic composition of the population
2001 census
for census output areas (~500 individuals)
Tobacco expenditure
consumer surveys (CACI, who produce ACORN consumer classification data)
for census output areas.
…linked by postcode to Millennium Cohort and national register data.
16. Models for formally analysing combined data Want estimate of the association between low birth-weight (full-term and pre-term) and THM exposure, using all data, accounting for:
Selection bias in MCS
Adjust models for predictors of selection
Missing confounders in register
Bayesian graphical model…
Missing outcomes in register data: no gestation age information to stratify the birth weight
Bayesian graphical model…
17. Graphical model representation
18. Graphical model representation
19. Missing confounder imputation model
20. Combining models
28. Real data analysis – United Utilities water company
29. Real data analysis – United Utilities water company
31. Results for the real data analysis (Low birth-weight full-term VS Normal)
32. Conclusion Evidence for association between THM exposure and low birth-weight full-term (but not with pre-term LBW)
Combining the datasets can
increase statistical power of the survey data
alleviate bias due to unmeasured confounding in the administrative data
Benefits of combining data via graphical model will depend on amount of information and strength of association provided by each sub-model
Must allow for selection mechanism of survey when combining data, and check compatibility of data sources
33. THANKS Jassy Molitor
Sylvia Richardson
Chris Jackson
36. two-levels VS one-level imputation