300 likes | 501 Views
Nonparametric Estimation with Recurrent Event Data. Edsel A. Pena Department of Statistics University of South Carolina E-mail: pena@stat.sc.edu. Talk at MUSC, 11/12/01. Based on joint works with R. Strawderman (Cornell) and M. Hollander (Florida State).
E N D
Nonparametric Estimation with Recurrent Event Data Edsel A. Pena Department of Statistics University of South Carolina E-mail: pena@stat.sc.edu Talk at MUSC, 11/12/01 Based on joint works with R. Strawderman (Cornell) and M. Hollander (Florida State) Research supported by NIH and NSF Grants
A Real Recurrent Event Data(Source: Aalen and Husebye (‘91), Statistics in Medicine) Variable:Migrating motor complex (MMC) periods, in minutes, for 19 individuals in a gastroenterology study concerning small bowel motility during fasting state.
t-S3=4 T2=59 T1=284 T3=186 S2=343 t=533 S3=529 S1=284 Calendar Scale Pictorial Representation of Data for a Subject • Consider unit/subject #3. • K = 3 • Period (Gap) Times, Tj: 284, 59, 186 • Censored Time, t-SK: 4 • Calendar Times, Sj: 284, 343, 529 • Limit of Observation Period: t = 533 0
Features of Data Set • Random observation period per subject (administrative constraints). • Length of period: t • Event of interest is recurrent. A subject may have more than one event during observation period. • # of events (K) informative about MMC period distribution (F). • Last MMC period right-censored by a variable informative about F. • Calendar times: S1, S2, …, SK. • Right-censoring variable: t-SK.
Assumptions and Problem • Aalen and Husebye: “Consecutive MMC periods for each individual appear (to be) approximate renewal processes.” • Translation: The inter-event times Tij’s are assumed stochastically independent. • Problem: Under this IID assumption, and taking into account the informativeness of K and the right-censoring mechanism, to estimate the inter-event distribution, F.
General Form of Data Accrual Calendar Times of Event Occurrences Si0=0 and Sij = Ti1 + Ti2 + … + Tij Number of Events in Observation Period Ki = max{j: Sij<ti} Upper limit of observation periods, t’s, could be fixed, or assumed to be IID with unknown distribution G.
Observables Theoretical Problem To obtain a nonparametric estimator of the gap-time or inter-event time distribution, F; and to determine its properties. Such estimator could be used for other purposes.
Relevance and Applicability • Recurrent phenomena occur in public health/medical settings. • Outbreak of a disease. • Hospitalization of a patient. • Tumor occurrence. • Epileptic seizures. • In other settings. • Non-life insurance claims. • When stock index (e.g., Dow Jones) decreases by at least 6% in one day. • Labor strikes.
Existing Methods and Their Limitations • Consider only the first, possibly right-censored, observation per unit and use the product-limit estimator (PLE). • Loss of information • Inefficient • Ignore the right-censored last observation, and use empirical distribution function (EDF). • Leads to bias (“biased sampling”) • Estimator actually inconsistent
Review: Prior Results Single-Event Complete Data • T1, T2, …, Tn IID F(t) = P(T < t) • Empirical Survivor Function (EDF) • Asymptotics of EDF where W1 is a zero-mean Gaussian process with covariance function
In Hazards View • Hazard rate function • Cumulative hazard function • Equivalences • Another representation of the variance
Single-Event Right-Censored Data • Failure times: T1, T2, …, Tn IID F • Censoring times: C1, C2, …, Cn IID G • Right-censored data (Z1, d1), (Z2, d2), …, (Zn, dn) Zi = min(Ti, Ci) di = I{Ti< Ci} • Product-limit or Kaplan-Meier Estimator
PLE/KME Properties • Asymptotics of PLE where W2 is a zero-mean Gaussian process with covariance function • If G(w) = 0 for all w, so no censoring,
Relevant Processes for Recurrent Event Setting • Calendar-Time Processes for ith unit Then, is a vector of square-integrable zero-mean martingales.
Difficulty: arises because interest is on l(.) or L(.), but these appear in the compensator process as is the length since last event at calendar time v t GapTime s 284 343 529 533 Calendar Time • Needed:Calendar-Gaptime Space For Unit 3 in MMC Data 0
Processes in Calendar-Gaptime Space • Ni(s,t) = # of events in calendar time [0,s] for the ith unit with gaptimes at most t • Yi(s,t) = number of events in [0,s] for the ith unit with gaptimes at least t.
MMC Unit #3 t GapTime t=100 s 284 343 529 533 s=400 Calendar Time K3(s=400) = 2 N3(s=400,t=100) = 1 Y3(s=400,t=100) = 1
Estimators of L and F for Recurrent Event Setting By “change-of-variable” formula, RHS is a sq-int. zero-mean martingale, so Estimator of L(t)
Estimator of F • Since a generalized product-limit estimator(GPLE). • GPLE extends the EDF for complete data and the PLE or KME for single-event right-censored data to recurrent setting.
Theorem: ; Limiting Distribution
EDF: • PLE: • GPLE (recurrent event): For large s, Comparison of Variance Functions • If in stationary state, R(t) = t/mF, so mG(w) is the mean residual life of t, given t> w.
Wang-Chang Estimator (JASA, ‘99) • Can handle correlated inter-event times. Comparison with GPLE may therefore be unfair to WC estimator!
Frailty Model • U1, U2, …, Un are IID unobservedGamma(a, a) random variables, called frailties. • Given Ui = u, (Ti1, Ti2, Ti3, …) are independent inter-event times with • Marginal survivor function of Tij:
Estimator Under Frailty Model • Frailty parameter, a, determines dependence among inter-event times. • Small (Large) a: Strong (Weak) dependence. • EM algorithm needed to obtain the estimator (FRMLE). Unobserved frailties viewed as missing.GPLE needed in EM algorithm. • Under frailty model: GPLE not consistent when frailty parameter is finite.
Monte Carlo Comparisons • Under gamma frailty model. • F = EXP(q): q = 6 • G = EXP(h): h = 1 • n = 50 • # of Replications = 1000 • Frailty parameter a took values {Infty (IID), 6, 2} • Computer programs: S-Plus and Fortran routines.
Simulated Comparison of the Three Estimators for Varying Frailty Parameter Black=GPLE; Blue=WCPLE; Red=FRMLE IID
Effect of the Frailty Parameter (a) for Each of the Three Estimators (Black=Infty; Blue=6; Red=2)
The Three Estimates of MMC Period Survivor Function IID assumption seems acceptable. Estimate of a is 10.2.