350 likes | 563 Views
Quick Overview to Survival Analysis. Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon 2007. 6. 2. Outline. Survival data Censoring & Truncation Survivor function & Hazard function Kaplan-Meier estimator Log-rank test
E N D
Quick Overview to Survival Analysis Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon 2007. 6. 2
Outline • Survival data • Censoring & Truncation • Survivor function & Hazard function • Kaplan-Meier estimator • Log-rank test • Cox proportional hazards model • Illustration with an example
What is survival analysis? • Outcome variable: Time until an event occurs • Time origin (eg, birth date, occurrence of entry into a study or diagnosis of a disease) • Time (eg, years, months, weeks or days) • Event (eg, death, disease incidence, relapse from remission…)
Censoring • Censoring: Don’t know survival time exactly • Why censoring may occur? • No event before the study ends • Lost to follow-up • Withdrawn from the study
A hypothetical example Weeks A Study end B C Withdrawn Study end D E Lost F
Another types of censoring • Left censoring: , observed to fail prior to Eg, Time to first use of marijuana • Q: When did you first use marijuana? • A: exact age, “I never used it.” or “I have used it but can not recall just when the first was.” • Double censoring • Interval censoring: Eg, Time to cosmetic deterioration of breast cancer patients
Censoring vs. Truncation • When occurs? Only those individuals whose event time lies within a certain observational window are observed • In contrast to censoring where there is at least partial information on each subject • Left truncation • When • Eg, Life lengths of elderly residents of a retirement community • Right truncation • When • Eg, Waiting time from infection at transfusion to clinical onset of AIDS (sampled on June 30, 1986)
Illustration • Data on 137 bone marrow transplant patients • Risk factors: patient and donor age, sex, and CMV status, waiting time from diagnosis to transplantation, FAB, MTX • Three groups: AML low risk(54), AML high risk(45), ALL(38) • Survival times • : time(in days) to death or end of study • : disease-free survival time(time to relapse, death or end of study) • : time to acute GvHD • : time to chronic GvHD • : time to return of platelets to normal levels
Simplified recovery process from BMT acute GvHD platelet recovery T R A N S P L A N T Relapse Death acute GvHD platelet recovery
Survivor function • (definition) Probability that a person survives longer than • (properties) § Non-increasing § § Eventually nobody would survive
Hazard function • (definition) instantaneous potential per unit time for the event to occur , given that the individual has survived up to • (properties) § Non-negative § No upper bound
vs. • Focus on not failing vs. failing The higher is, the smaller is • Directly describe survival vs. insight about conditional failure rates • (relational formula) or
Three goals of survival analysis • Estimate survivor and/or hazard functions • Compare survivor and/or hazard functions • Assess the relationship of explanatory variables to survival time
Kaplan-Meier estimator • (Distinct) observed survival times: , conventionally • : # of individuals fail at • : # of individuals censored in : # of individuals at risk just prior to • multiplying (1-observed proportion of failures) at each survival times
Remarks on • Never reduce to zero if Not defined for >(largest time recorded) • (estimated asymptotic variance) : Greenwood’s formula • Pointwise 95% confidence interval for : linear and symmetrical, but possibly lies out of (0,1) and low coverage rate with very small samples • Life table estimator: used for the survival data grouped into convenient intervals • Nelsen-Aalen estimator for cumulative hazard function :
Illustration (revisited) • Survival time=time to relapse, death or end of study, i.e, disease-free time • Estimated disease-free survival curves: AML low risk>ALL>AML high risk • Estimated cumulative hazard rates
Comparison of survivor functions • Test whether or not the survivor functions for two groups are equivalent • : (distinct) survival times by pooling all the sample from two groups • : (observed ) # of failures at in group, • : # of individuals at risk just prior to in group
Comparison of survivor functions • Idea: Based on • Log-rank statistic § under § If reject a test for equality of the survivor functions at level
Remarks for log-rank test • Choice of weight function: , specially for log-rank test • Extension to three or more groups • Stratification on a set of covariates • Trend test for ordered alternatives : plugging in any set of scores
Illustration (revisited) • Test that the disease-free survival curves of three groups are same over 2,204 • Three types of test statistic Log-rank: 13.8037 (p-value=0.0010) with Gehan: 16.2407 (p-value=0.0003) with Taron-Ware: 15.6529 (p-value=0.0004) with highly significant!
Cox proportional hazards model • Why regression models need? To predict covariates(or explanatory variables, risk factors) for time to event • Data: • Cox model : hazard rate at for an individual with risk vector • A sort of semiparametric model parametrically for the covariate effect + nonparametrically for baseline hazard function • Why PH is called? RR(or HR)= is constant against
Illustration (revisited) • Background: to adjust the comparisons of the three risk groups because this was not a randomized clinical trial • Fixed risk factors • =1 if AML low-risk, =1 if AML high-risk • =waiting time • =FAB • =MTX • =1 if donor: male; =1 if patient: male; =1 if donor & patient: male • =1 if donor: CMV positive; =1 if patient: CMV positive; =1 if donor & patient: CMV positive • =donor age-28; =patient age-28;
ANOVA table for final model (fixed only) 1 0.837 0.279 9.03 0.003 1 0.004 0.018 0.05 0.831 1 0.007 0.020 0.12 0.728 1 0.003 0.001 11.01 0.001 Degrees of Freedom b SE(b) Wald Chi Square p-Value 1 -1.091 0.354 9.48 0.002 1 -0.404 0.363 1.24 0.265
Other regression models • Additive hazards model: • Accelerated failure time model: • Focus on direct relationship between and time to event • Effect of covariates is multiplicative on rather then on hazard function • Parametric, but providing a good fit if correctly chosen
Refinements of Cox model • Stratification § When the PH assumption is violated for some covariate § • Time-dependent covariates §Eg, BP, cholesterol, size of the tumor … §
Illustration (revisited) • Time-dependent covariates • Whether or not aGvHD occurs at time NS! • Whether or not cGvHD occurs at time NS! • Whether or not the platelets recovered at time Significant! • Final risk factors • Fixed-time effects: Disease group, FAB, Age • Time-dependent effect: Platelet recovery • Time-dependent interactions: Disease group Platelet recovery, Age Platelet recovery, FAB Platelet recovery
Three regressions with a time-dependent covariate Degrees of Freedom b SE(b) Wald Chi Square p-Value 1 -0.5516 0.2880 3.6690 0.0554 1 0.4338 0.2722 2.5400 0.1110 1 0.3184 0.2851 1.2470 0.2642 1 -0.6225 0.2962 4.4163 0.0356 1 0.3657 0.2685 1.8548 0.1732 1 -0.1948 0.2876 0.4588 0.4982 1 -0.4962 0.2892 2.9435 0.0862 1 0.3813 0.2676 2.0306 0.1542 1 -1.1297 0.3280 11.8657 0.0006
ANOVA table for final model (fixed+time-variant) 1 -3.0374 0.9257 10.765 0.0010 1 -1.8675 1.2908 2.093 0.1479 1 2.4535 1.1609 4.467 0.0346 1 0.1933 0.0588 10.821 0.0010 1 -0.1470 0.0480 9.383 0.0022 1 0.0001 0.0023 0.003 0.9561 Degrees of Freedom b SE(b) Wald Chi Square p-Value AML low risk 1 1.3073 0.8186 2.550 0.1103 AML high risk 1 1.1071 1.2242 0.818 0.3658 AML with FAB Grade 4 or 5 1 -1.2348 1.1139 1.229 0.2676 Patient age -28 1 -0.1538 0.0545 7.948 0.0048 Donor age -28 1 0.1166 0.0434 7.229 0.0072 1 0.0026 0.0020 1.786 0.1814 Platelet Recovery 1 -0.3062 0.6936 0.195 0.6589
What did we overview so far? • How to define survival data • Censoring vs. truncation • Survivor function vs. hazard function and their relation • How to estimate the survival function: KM estimator • How to compare survival functions: Log-rank test • How to estimate risk factors: Cox proportional hazards model with fixed and/or time-dependent covariates • Illustrations with BMT data