1 / 35

Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

Quick Overview to Survival Analysis. Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon 2007. 6. 2. Outline. Survival data Censoring & Truncation Survivor function & Hazard function Kaplan-Meier estimator Log-rank test

nolen
Download Presentation

Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Quick Overview to Survival Analysis Jinheum Kim (jinhkim@suwon.ac.kr) Department of Applied Statistics University of Suwon 2007. 6. 2

  2. Outline • Survival data • Censoring & Truncation • Survivor function & Hazard function • Kaplan-Meier estimator • Log-rank test • Cox proportional hazards model • Illustration with an example

  3. What is survival analysis? • Outcome variable: Time until an event occurs • Time origin (eg, birth date, occurrence of entry into a study or diagnosis of a disease) • Time (eg, years, months, weeks or days) • Event (eg, death, disease incidence, relapse from remission…)

  4. Censoring • Censoring: Don’t know survival time exactly • Why censoring may occur? • No event before the study ends • Lost to follow-up • Withdrawn from the study

  5. A hypothetical example Weeks A Study end B C Withdrawn Study end D E Lost F

  6. Another types of censoring • Left censoring: , observed to fail prior to Eg, Time to first use of marijuana • Q: When did you first use marijuana? • A: exact age, “I never used it.” or “I have used it but can not recall just when the first was.” • Double censoring • Interval censoring: Eg, Time to cosmetic deterioration of breast cancer patients

  7. Censoring vs. Truncation • When occurs? Only those individuals whose event time lies within a certain observational window are observed • In contrast to censoring where there is at least partial information on each subject • Left truncation • When • Eg, Life lengths of elderly residents of a retirement community • Right truncation • When • Eg, Waiting time from infection at transfusion to clinical onset of AIDS (sampled on June 30, 1986)

  8. Illustration • Data on 137 bone marrow transplant patients • Risk factors: patient and donor age, sex, and CMV status, waiting time from diagnosis to transplantation, FAB, MTX • Three groups: AML low risk(54), AML high risk(45), ALL(38) • Survival times • : time(in days) to death or end of study • : disease-free survival time(time to relapse, death or end of study) • : time to acute GvHD • : time to chronic GvHD • : time to return of platelets to normal levels

  9. Simplified recovery process from BMT acute GvHD platelet recovery T R A N S P L A N T Relapse Death acute GvHD platelet recovery

  10. Survivor function • (definition) Probability that a person survives longer than • (properties) § Non-increasing § § Eventually nobody would survive

  11. Hazard function • (definition) instantaneous potential per unit time for the event to occur , given that the individual has survived up to • (properties) § Non-negative § No upper bound

  12. vs. • Focus on not failing vs. failing The higher is, the smaller is • Directly describe survival vs. insight about conditional failure rates • (relational formula) or

  13. Three goals of survival analysis • Estimate survivor and/or hazard functions • Compare survivor and/or hazard functions • Assess the relationship of explanatory variables to survival time

  14. Kaplan-Meier estimator • (Distinct) observed survival times: , conventionally • : # of individuals fail at • : # of individuals censored in : # of individuals at risk just prior to • multiplying (1-observed proportion of failures) at each survival times

  15. Remarks on • Never reduce to zero if Not defined for >(largest time recorded) • (estimated asymptotic variance) : Greenwood’s formula • Pointwise 95% confidence interval for : linear and symmetrical, but possibly lies out of (0,1) and low coverage rate with very small samples • Life table estimator: used for the survival data grouped into convenient intervals • Nelsen-Aalen estimator for cumulative hazard function :

  16. Illustration (revisited) • Survival time=time to relapse, death or end of study, i.e, disease-free time • Estimated disease-free survival curves: AML low risk>ALL>AML high risk • Estimated cumulative hazard rates

  17. Survival curves for three disease groups

  18. CI of survival for ALL group

  19. CI of survival for high-risk AML group

  20. CI of survival for low-risk AML group

  21. Comparison of survivor functions • Test whether or not the survivor functions for two groups are equivalent • : (distinct) survival times by pooling all the sample from two groups • : (observed ) # of failures at in group, • : # of individuals at risk just prior to in group

  22. Comparison of survivor functions • Idea: Based on • Log-rank statistic § under § If reject a test for equality of the survivor functions at level

  23. Remarks for log-rank test • Choice of weight function: , specially for log-rank test • Extension to three or more groups • Stratification on a set of covariates • Trend test for ordered alternatives : plugging in any set of scores

  24. Illustration (revisited) • Test that the disease-free survival curves of three groups are same over 2,204 • Three types of test statistic Log-rank: 13.8037 (p-value=0.0010) with Gehan: 16.2407 (p-value=0.0003) with Taron-Ware: 15.6529 (p-value=0.0004) with highly significant!

  25. Cox proportional hazards model • Why regression models need? To predict covariates(or explanatory variables, risk factors) for time to event • Data: • Cox model : hazard rate at for an individual with risk vector • A sort of semiparametric model parametrically for the covariate effect + nonparametrically for baseline hazard function • Why PH is called? RR(or HR)= is constant against

  26. Illustration (revisited) • Background: to adjust the comparisons of the three risk groups because this was not a randomized clinical trial • Fixed risk factors • =1 if AML low-risk, =1 if AML high-risk • =waiting time • =FAB • =MTX • =1 if donor: male; =1 if patient: male; =1 if donor & patient: male • =1 if donor: CMV positive; =1 if patient: CMV positive; =1 if donor & patient: CMV positive • =donor age-28; =patient age-28;

  27. ANOVA table for final model (fixed only) 1 0.837 0.279 9.03 0.003 1 0.004 0.018 0.05 0.831 1 0.007 0.020 0.12 0.728 1 0.003 0.001 11.01 0.001 Degrees of Freedom b SE(b) Wald Chi Square p-Value 1 -1.091 0.354 9.48 0.002 1 -0.404 0.363 1.24 0.265

  28. Other regression models • Additive hazards model: • Accelerated failure time model: • Focus on direct relationship between and time to event • Effect of covariates is multiplicative on rather then on hazard function • Parametric, but providing a good fit if correctly chosen

  29. Refinements of Cox model • Stratification § When the PH assumption is violated for some covariate § • Time-dependent covariates §Eg, BP, cholesterol, size of the tumor … §

  30. Illustration (revisited) • Time-dependent covariates • Whether or not aGvHD occurs at time NS! • Whether or not cGvHD occurs at time NS! • Whether or not the platelets recovered at time Significant! • Final risk factors • Fixed-time effects: Disease group, FAB, Age • Time-dependent effect: Platelet recovery • Time-dependent interactions: Disease group Platelet recovery, Age Platelet recovery, FAB Platelet recovery

  31. Three regressions with a time-dependent covariate Degrees of Freedom b SE(b) Wald Chi Square p-Value 1 -0.5516 0.2880 3.6690 0.0554 1 0.4338 0.2722 2.5400 0.1110 1 0.3184 0.2851 1.2470 0.2642 1 -0.6225 0.2962 4.4163 0.0356 1 0.3657 0.2685 1.8548 0.1732 1 -0.1948 0.2876 0.4588 0.4982 1 -0.4962 0.2892 2.9435 0.0862 1 0.3813 0.2676 2.0306 0.1542 1 -1.1297 0.3280 11.8657 0.0006

  32. ANOVA table for final model (fixed+time-variant) 1 -3.0374 0.9257 10.765 0.0010 1 -1.8675 1.2908 2.093 0.1479 1 2.4535 1.1609 4.467 0.0346 1 0.1933 0.0588 10.821 0.0010 1 -0.1470 0.0480 9.383 0.0022 1 0.0001 0.0023 0.003 0.9561 Degrees of Freedom b SE(b) Wald Chi Square p-Value AML low risk 1 1.3073 0.8186 2.550 0.1103 AML high risk 1 1.1071 1.2242 0.818 0.3658 AML with FAB Grade 4 or 5 1 -1.2348 1.1139 1.229 0.2676 Patient age -28 1 -0.1538 0.0545 7.948 0.0048 Donor age -28 1 0.1166 0.0434 7.229 0.0072 1 0.0026 0.0020 1.786 0.1814 Platelet Recovery 1 -0.3062 0.6936 0.195 0.6589

  33. Tests of PH assumption

  34. What did we overview so far? • How to define survival data • Censoring vs. truncation • Survivor function vs. hazard function and their relation • How to estimate the survival function: KM estimator • How to compare survival functions: Log-rank test • How to estimate risk factors: Cox proportional hazards model with fixed and/or time-dependent covariates • Illustrations with BMT data

  35. THANK YOU!

More Related