200 likes | 316 Views
Using martingale residuals to assess goodness of fit for sampled risk set data. Ørnulf Borgan Department of Mathematics University of Oslo Based on joint work with Bryan Langholz. Outline:. Example: Uranium miners cohort Cohort model, data and martingale residuals Risk set sampling
E N D
Using martingale residuals to assess goodness of fit for sampled risk set data Ørnulf Borgan Department of Mathematics University of Oslo Based on joint work with Bryan Langholz
Outline: • Example: Uranium miners cohort • Cohort model, data and martingale residuals • Risk set sampling • Martingale residuals and goodness-of-fit tests for sampled risk set data • Concluding remarks
Uranium miners cohort: (e.g. Langholz & Goldstein, 1996) • 3347 uranium miners from Colorado Plateau included in study cohort 1950-60 • Followed-up until end of 1982 • 258 lung cancer deaths • Interested in effect of radon and smoking exposure on the risk of lung cancer death • Have exposure information for the full cohort. Will sample from the risk sets for illustration
Relative risk regression models Hazard rate for individual i relative risk baseline hazard Relative risk for individual i depends on covariates xi1, xi2 , … , xip(possibly time-dependent) Cox: Excess relative risk:
Cohort data: (arrows are censored observations) Study time individuals at risk
t1< t2 < t3 < ….times of failures ijindividual failing at tj ("case") Counting process for individual i : Intensity processli(t) is given by
at risk indicator hazard rate Cumulative intensity processes: Martingales: Martingale residual processes:
Martingal residual processes may be used to assess goodness of fit: • Plot individualmartingale residuals versus covariates (Therneau, Grambsch & Flemming,1990) • Plot groupedmartingale residual processes versus time (Aalen,1993; Grønnesby & Borgan,1996) The latter may be extended to sampled risk set data
Risk set sampling • Cohort studies need information on covariates for all individuals at risk • Expensive to collect and check (!) this information for all individuals in large cohorts • For risk set sampling designs one only needs to collect covariate information for the cases and a few controls sampled at the times of the failure
Select m –1 controls among the n(t) – 1 non-failuresat risk if a case occurs at time t, i.e. match on study time Illustration for m = 2 case control
A sampled risk set consists of the case ijand its controls A sampling design for the controls is described by its sampling distribution A number of sampling designs are available The classical nested case-control design:If individual i fails at time tthe probability of selecting the set ras the sampled risk set is (we assume that r is a subset of the risk set, that r is of size m and that i is in r)
Inference on the regression coefficients can be based on the partial likelihood The partial likelihood enjoys usual likelihood properties (Borgan, Goldstein & Langholz1995) For the classical nested case-control design, the partial likelihood simplifies
Martingale residuals and goodness-of-fit tests for sampled risk set data Introduce the counting processes Intensity processes take the form:
Corresponding martingales: Martingale residual processes: The are of little practical use on their own, but they may be aggregated over groups of individuals to produce useful plots
For group g May be interpreted as "observed _ expected" number of failures in group g Simplifies for classical nested case-control Asymptotic distribution may be derived using counting process methods
Ilustration: uranium miners cohort Fit excess relative risk model: xi1 = cumulative radon (100 WLMs) xi2 = cumulative smoking (1000 packs) For classical nested case-control with three controls per case:
Aggregate martingale residual processes in three groups according to cumulative radon exposure: Groups: I: < 500 WLMs II: 500-1500 WLMs III: > 1500 WLMs There are indications for an interaction between cumulative radon exposure and age
Observed and expected number of failures in the groups for ages below and above 60 years: Chi-squared statistic with 2(3 – 1) = 4 df takes the value 10.5 (P-value 3.2%)
Concluding remarks The counting process formulation of nested case-control studies: • Introduces a time aspect that is usually disregarded for sample risk set data • Gives a similar model formulation as for cohort data and thereby opens up for similar methodo-logical developments as for cohort studies • Grouped martingale residual processes is one example of this. They allow to check for time-dependent effects and other deviations from the model
Questions and further develoments of grouped martingale residual plots and related goodness-of-fit methods • How should the grouping be performed? • How do specific deviations from the model turn up in the plots? • Kolmogorov-Smirnov and Cramer von Mises type tests? (Durbin’s approximation, Lin et al’s simultation trick)