210 likes | 401 Views
Graphical models for combining multiple sources of information in observational studies. ii) Case study of socioeconomic risk factors for cardiovascular disease. Christopher Jackson With Nicky Best and Sylvia Richardson Department of Epidemiology and Public Health Imperial College, London
E N D
Graphical models for combining multiple sources of information in observational studies. ii) Case study of socioeconomic risk factors for cardiovascular disease Christopher Jackson With Nicky Best and Sylvia Richardson Department of Epidemiology and Public Health Imperial College, London chris.jackson@imperial.ac.uk ESRC BIAS project http://www.bias-project.org.uk
Cardiovascular hospitalisation study Question • What are the socio-economic predictors of hospitalisation for heart and circulatory disease for individuals? • Is there evidence of contextual (area-level) effects? Design Data synthesis using • 4 sources of data (London only) • Area-level administrative data, combined with individual survey data • Exposures and outcomes at both aggregate and individual level Method • Hierarchical related regressions
Hierarchical related regression Infer individual-level relationships using both individual and aggregate data Individual-level model • Logistic regression for individual-level binary outcome • Includes individual or area-level predictors • Use this to • model the individual-level data • construct correct model for aggregate data Model for aggregate data • Equivalent regression of mean outcome on mean covariates usually gives ecological bias for individual-level relative risks. • Integrate individual-level model over within-area joint distribution of covariates, giving average group-level risk. Combined model • Individual and aggregate data assumed to be generated by the same baseline and relative risk parameters. • Estimate these parameters using both datasets simultaneously
Data sources AGGREGATE Hospital Episode Statistics • number of people admitted for CVD in area, by age/sex (1998) Census small area statistics • age-sex population of area • proportion non-white • proportion social class IV/V • proportion no car access (for 33 districts or 759 wards in London, 1991) INDIVIDUAL Health Survey for England • Self-reported admission to hospital for CVD (1998 only) • individual age and sex • individual ethnicity • individual social class • individual car access (for 1527 individuals in London) Baseline and relative risk of CVD admission for individual
AGGREGATE Hospital Episode Statistics • number of CVD admissions in area in 1998, by age group/sex Census small area statistics • marginal proportions non-white, social class IV/V,… Census Samples of Anonymised Records (2%) • full within-area cross-classification of individuals, age/sex/ethnicity/social class/car ownership - required for correct aggregate model INDIVIDUAL Health Survey for England • Self-reported admission to hospital for CVD (1998 only) • Self-reported long-term CVD (1997, 1999, 1998, 2000, 2001) Regression imputation for missing hospital admission in not-1998. • individual age and sex • individual ethnicity • individual social class • individual car access (for 4463 individuals in London) Baseline and relative risk of CVD admission for individual
Basic illustration of combining individual and aggregate data Aggregate census data disease yi Area admissions count UNKNOWNS e.g. proportion low social class exposure xi Relative risk for individuals Area baseline risk mi Areas i b DATA Individual survey data exposure xij Individual social class CVD admission disease yij Areas i, individuals j
Census Samples of Anonymised Records Areas i, individuals l xil Aggregate census data Cross-classification of individuals yik xirsk Areas i xir xis xik Relative risk for exposures Area/stratum baseline risk social class r, employment status s, age/sex strata k. mik DATA b Individual survey data exposures xij Exposures More complex models for disease, more confounders, need another data source. CVD admission disease yij Areas i, individuals j
Census Samples of Anonymised Records Areas i, individuals l xil Aggregate census data Cross-classification of individuals yik xirsk Areas i xir xis xik Relative risk for exposures Area/stratum baseline risk social class r, employment status s, age/sex strata k. DATA mik b Survey data (1998) Survey data (1997-2001) yij* xij Self reported CVD CVD admissions including imputed values Imputing missing outcomes in individual data CVD admissions yij yij Areas i, individuals j Areas i, individuals j
Are aggregate and individual data compatible? Health Survey England aggregated over districts Census covariates or Hospital Episode Statistics data
Estimated coefficients (with 95% CI) for multiple regression model of the risk of hospitalisation Individual data only Aggregate data only Models combining individual and aggregated data
Individual and area-level predictors • Area level covariates in underlying model for hospitalisation risk (Carstairs deprivation index) • No significant influence of Carstairs, after accounting for individual-level factors • Random effects models • Random area-level baseline risk, quantifies remaining variability between areas. • After adjusting for covariates, variance partitioned into individual / area-level components • 4% of residual variance between wards attributable to unobserved area-level factors (2% for districts) • Little evidence of contextual effects
Further work on combining observational data • Different but related information in each dataset • self-reported disease, versus hospital admission records. • Conflicts between information in different datasets • self-completed and interviewed responses to surveys? • Important variables missing in one dataset • available in health survey but not administrative aggregates. • e.g. biological risk factors for cardiovascular disease.
Publications • Available from http://www.bias-project.org.uk • C. Jackson, N. Best, S. Richardson. Hierarchical related regression for combining aggregate and survey data in studies of socio-economic disease risk factors. Submitted to Journal of the Royal Statistical Society, Series A. • C. Jackson, N. Best, S. Richardson. Improving ecological inference using individual-level data. Statistics in Medicine, in press. http://www3.interscience.wiley.com/cgi-bin/abstract/112099931/ABSTRACT