260 likes | 417 Views
TWO-STAGE CASE-CONTROL STUDIES USING EXPOSURE ESTIMATES FROM A GEOGRAPHICAL INFORMATION SYSTEM. Jonas Björk 1 & Ulf Strömberg 2 1 Competence Center for Clinical Research 2 Occupational and Environmental Medicine Lund University Hospital. OUTLINE OF TALK.
E N D
TWO-STAGE CASE-CONTROL STUDIES USING EXPOSURE ESTIMATES FROM A GEOGRAPHICAL INFORMATION SYSTEM Jonas Björk1 & Ulf Strömberg2 1Competence Center for Clinical Research 2Occupational and Environmental Medicine Lund University Hospital
OUTLINE OF TALK • Previous project: What have we done? (Jonas Björk) • Ongoing project: What shall we do? (Ulf Strömberg)
Two-stage procedure for case-control studies 1st stage Complete data obtained from registries Disease status General characteristics Group affiliation (e.g. occupation or residential area) Group-level exposure XG 2nd stage Individual exposure data for a subset of the 1st stage sample
Exposure database group-level exposure • JEM = Job Exposure Matrix Occupational group proportion exposed • GIS Residential group (area) average concentration of an air pollutant
JEM - proportion exposed Most data typically in groups with low XG
Linear Relation between Proportion Exposed and Relative Risk • No confounding between/within groups Example: RR (exposed vs. unexposed) = 2.0
Linear OR model: OR(XG) = 1 + β XG XG = Exposure proportion OR for exposed vs. unexposed = OR(1) = 1 + β OR(1) Most data typically in groups with low XG 1 XG 0 1
Confounding between groups • General confounders (eg, gender and age) can normally be adjusted for • Assuming no confounding within groups and no effect modification in any stratum sk: OR(XG;s1, s2, ...sk) = (1 + β XG) exp(Σγksk)
Combining 1st and 2nd stage data • Assumption: 2nd stage data missing at random condition on disease status and 1st stage group affiliation • For subjects with missing 2nd stage data: Use 1st stage data to calculate expected number of exposed/unexposed • Expectation-maximization (EM) algorithm
EM-algorithm(Wacholder & Weinberg 1994) 1. Select a starting value, e.g. OR=1 2. E-step Among the non-participants, calculate expected number of exposed/unexposed case and controls in each group 3. M-step Maximize the likelihood for observed+expected cell frequencies using the chosen risk model for individual-level data (not necessarily linear) New OR-estimate 4. Repeat 2. and 3. until convergence
E-step in our situation (Strömberg & Björk, submitted) ÔR = Current OR-estimate Complete the data in each group G: • m0 controls with missing 2nd stage data m0 * XG = expected number of exposed • m1 cases with missing 2nd stage data m1 * XG * ÔR / [1+(ÔR-1)* XG]
Simulated case-control studies • 400 cases, 1200 controls in the 1st stage • 2nd stage participation 75% of the cases 25% of the controls • Selective participation of 2nd stage controls Corr(Participation, XG) =0, > 0, <0 • 1000 replications in each scenario • True OR = 3
Simulations - Results SD = Empirical standard deviation of the ln(OR) estimates Coverage = Coverage of 95% confidence intervals
Simulations - Conclusions • Combining 1st and 2nd stage data, • using the EM method can: • 1. Improve precision • 2. Remove bias from selective participation • Method is sensitive to errors in the • (1st stage) external exposure data!
Simulations – Conclusions II • EM-method is sensitive to • Violations of the MAR-assumption • (condition on on disease status and 1st stage group affiliation) • 2. Errors in the (1st stage) external exposure data
Ongoing methodological research project • Focus on exposure estimates from a GIS
Two-stage exposure assessment procedure 1st stage:XG represents mean exposure levels rather than proportion exposed XG = 4.8 XG = 10.1 XG = 20.1 ... xi xi xi 2nd stage:xi is a continuous, rather than a dichotomous, exposure variable
Assume a linear relation between and xi and disease odds (cf. radon exposure and lung cancer [Weinberg et al., 1996]). Odds xi For the ”only 1st stage” subjects: no bias expected by using their XG:s (Berkson errors) provided MAR in each group – independent of disease status. EM method? Exposure variation in each group?
Two-stage exposure assessment procedure – related work • Multilevel studies with applications to a study of air pollution [Navidi et al., 1994]: pooling exposure effect estimates based on individual-level and group-level models, respectively
Collecting data on confounders or effect modifiers at 2nd stage 1st stage:XG = mean exposure levels XG = 4.8 XG = 10.1 XG = 20.1 ... ci ci ci 2nd stage:ci is a covariate, e.g. smoking history
Data on confounders or effect modifiers at 2nd stage – estimation of exposure effect • Confounder adjustment based on logistic regression: pseudo-likelihood approach [Cain & Breslow, 1988] • More general approach: EM method [Wacholder & Weinberg, 1994]
Design stage (“stage 0”) 1st stage: How many geographical areas (groups)? Group1 Group 2 Group 3 ... Subjects? ? ? 2nd stage: Fractions of the 1st stage cases and controls?
Design stage – related work • Two-stage exposure assessment: power depends more strongly on the number of groups than on the number of subjects per group [Navidi et al., 1994]
References I • Björk & Strömberg. Int J Epidemiol 2002;31:154-60. • Strömberg & Björk. “Incorporating group-level exposure information in case-control studies with missing data on dichotomous exposures”. Submitted.
References II • Cain & Breslow. Am J Epidemiol 1988;128:1198-1206. • Navidi et al. Environ Health Perspect 1994;102(Suppl 8):25-32. • Wacholder & Weinberg. Biometrics 1994;50:350-7. • Weinberg et al. Epidemiology 1996;7:190-7.