230 likes | 380 Views
Borgan and Henderson: Event History Methodology Lancaster, September 2006 Session 8.1: Cohort sampling for the Cox model. Relative risk regression models. Hazard rate for individual i. baseline hazard. relative risk (hazard ratio).
E N D
Borgan and Henderson: Event History Methodology Lancaster, September 2006Session 8.1: Cohort sampling for the Cox model
Relative risk regression models Hazard rate for individual i baseline hazard relative risk (hazard ratio) Relative risk for individual i depends on covariates xi=(xi1, xi2 , … , xip), possibly time-dependent Cox: Excess relative risk:
Cohort data with delayed entry (arrows are censored observations) Study time individuals at risk
Estimate regression coefficients by maximizing Cox's partial likelihood The partial likelihood is a product over all failure times (event times)The contribution for the individual ij failing at tj is Rj is the risk set at tj n(t) is number at risk at t– Need information on covariates for allindividuals at risk
Cohort sampling designs • Expensive to collect and check (!) covariate information for all individuals in large cohorts • Also not necessary when there are few events — the cases carry most of the statistical information • Useful to have cohort sampling designs where one only needs to collect covariate information for cases and controls • Nested case-control • Case-cohort
Classical nested case-control design Select at random at each failure time m – 1 controls among the n(tj) –1non-failures at risk Illustration for m = 2 case control
Counter-matched nested case-control design The statistical information in a sampled risk set(a case and its controls) depends on the variation of the covariate values within the set We may obtain "large" variation of an exposure of interest by counter-matching on (i) a surrogate measure for the exposure(ii) exposure whencorrecting for a confounder Classify each individual at risk into one of L strata based on information available for everyone. Select the controls by stratified random sampling
Want a specified number mlfrom each stratum l in a sampled risk set (a case and its controls) Select mlcontrols among those at risk in stratum l, except for the case's stratum swhere only ms – 1 controls are selected Illustration for L= 2and m1 = m2 = 1
Sampled risk set consists of the case ij and the m – 1controls A sampling design is described by its sampling distribution Classical nested case-control design:If individual i fails at time tthe probability of selecting r as the sampled risk set is (we assume that r is a subset of the risk set, that r is of size m and that i is in r)
Counter-matched nested case-control design: Denote by nl(t) the number at risk in stratum l at time t– If individual i in stratum s(i)fails at time tthe probability of selecting r as the sampled risk set is (under suitable assumptions on the set r)
Partial likelihood Introduce the counting process N(i,r)(t) counting the number of times in [0,t] that individual ifails and the sampled risk set equals r Corresponding intensity process: This takes the form: sampling probability at risk indicator hazard rate
Introduce the aggregated processes:s Probability that individual ifailsgiven that a failure occurs at t and given that the sampled risk set is r: Partial likelihood is a product of such factors over all failures and sampled risk set occurrences (after cancelling common factors)
Contribution to the partial likelihood from a sampled risk set: Classical nested case-control: Counter-matched case-control: May estimate regression parameters by software for relative risk regression (Cox, etc) that allows for "offsets". By similar counting process arguments as for the full cohort, one may show that the usual large sample likelihood methods apply.
Uranium miners cohort • 3347 uranium miners from Colorado Plateau included in study cohort 1950-60 • Followed-up until end of 1982 • 258 lung cancer deaths • Interested in effect of radon and smoking exposure on the risk of lung cancer death • Have exposure information for the full cohort. Will use cohort sampling for illustration
Fit excess relative risk model: xi1 = cumulative radon (100 WLMs) xi2 = cumulative smoking (1000 PACKS) Countermatch on radon exposure quartiles.
Classical case-cohort design Select at random a subcohortC consisting of a fraction p of the full cohort Illustration for p = 0,50 subcohort
Use a pseudo likelihood for estimation Contribution to pseudo likelihood for a case: Software for relative risk regression (Cox, etc) may be "tricked" to do the estimation Likelihood methods do not apply. Standard errors from statistical software need to be fixed, and likelihood ratio tests cannot be used
Stratified case-cohort design Select the subcohort by stratified random sampling of a fraction psfrom stratums Illustration for S = 2 and p1= p2 = 0,50
Contribution to pseudo likelihood for a case: Weights: for i in stratum s Alternative versions of the pseudo likelihood are available
Simulation with one normal covariate: Stratify into two strata according to a binary surrogate that is available for everyone 10 % surrogate positive individuals Covariate N(0,1) for surrogate negative individuals and N(m, s2) for surrogate positive individuals Baseline and censoring adjusted to get 10% failures and 20% censoring before the ”closure of the study”
Simulation repeated 1000 times with • 1000 individuals in the cohort • 100 individuals in the subcohort for case-cohort • 100 controls (on the average) for nested case-control Efficiencies in % relative to full cohort: m=2 m=4 m=4s=1 s=1 s=2Classical nested case-control 40 32 27Classical case-cohort 39 30 19Counter matched nested case-control 46 76 75Stratified case-cohort 51 71 72
Nested case-control or case-cohort? • Statistical inference: • Nested case-control (NCC): usual likelihood methods apply, and standard software may be used for the analysis • Case-cohort (CC): Likelihood methods are not valid, but statistical software may be "tricked" to do the analysisStatistical efficiency is about the same for the two design • Missing covariates: • NCC: in a 1:1 design a sampled risk set is lost if covariate information is missing for the control • CC: missing covariates in the subcohort are less serious
Logistics for prospective studies: • NCC: control sampling has to wait until cases occur • CC: subcohort can be selected at the outset • Time scale for analysis: • NCC: must be decided before sampling of controls • CC: need not be decided before sampling of subcohort