340 likes | 513 Views
INTRODUCTION TO CLINICAL RESEARCH Survival Analysis – Getting Started Karen Bandeen-Roche, Ph.D. July 20, 2010. Acknowledgements. Scott Zeger Marie Diener-West ICTR Leadership / Team. Introduction to Survival Analysis. Thinking about times to events; contending with “censoring”
E N D
INTRODUCTION TO CLINICAL RESEARCH Survival Analysis – Getting Started Karen Bandeen-Roche, Ph.D. July 20, 2010
Acknowledgements • Scott Zeger • Marie Diener-West • ICTR Leadership / Team JHU Intro to Clinical Research
Introduction to Survival Analysis • Thinking about times to events; contending with “censoring” • Counting process view of times to events • Hazard and survival functions • Kaplan-Meier estimate of the survival function • Future topics: log-rank test; Cox proportional hazards model
“Survival Analysis” • Approach and methods for analyzing times to events • Events not necessarily deaths (“survival” is historical term) • Need special methods to deal with “censoring”
Typical Clinical Study with Time to Event Outcome Loss to Follow-up Event Start End Enrollment End Study 0 2 4 6 8 10 Calendar time
Switching from Calendar to Follow-up Time >3 5 >8 1 >6 Follow-up time 0 2 4 6 8 10
The Problem with Standard Analyses of Times to Events • Mean: (1 + 3 + 5 + 6 + 8)/5 = 4.6 - right? • Median: 5 – right? • Histogram
Censoring > 3 is not 3, it may be 33 Mean is not 4.6, it may be (1 + 33 + 5 + 6 + 8)/5 = 10.6 Or any value greater than 4.6 > 3 is a right “censored value” – we only know the value exceeds 3 > x is often written “x+”
Censoring • Uncensored data: The event has occurred • Event occurrence is observed • Censored data: The event has yet to occur • Event-free at the current follow-up time • A competing event that is not an endpoint stops follow-up • Death (if not part of the endpoint) • Clinical event that requires treatment, etc. • Our ability to observe ends before event happens
Contending with Censored Data Standard statistical methods do not work for censored data We need to think of times to events as a natural history in time, not just a single number • Issue: If no events are reported in the interval from • last follow-up to “now”, need to choose between: • No news is good news? • No news is no news?
One Option: Overall Event Rate • Example: 2 events in 23 person months = 1 event per 11.5 months = 1.04 events per year = 104 events per 100 person-years • Gives an average event rate over the follow-up period; actual event rate may vary over time • For a finer time resolution, do the above for small intervals
Switching from Calendar to Follow-up Time >3 5 >8 1 >6 Follow-up time 0 2 4 6 8 10 3+5+8+1+6 person months of observation; 2 actual events
Second Option: Natural history “One day at a time” 0 0 0 >3 0 0 0 0 1 5 0 0 0 0 0 0 0 0 >8 1 1 0 0 0 0 0 0 >6 Follow-up time 0 2 4 6 8 10
Survival Function “Survival function”, S(t), is defined to be the probability a person survives beyond time t S(0) = 1.0 S(t+1) S(t)
Hazard Function • Hazard at time t, h(t), is the probability per unit time of having the event in a small interval around time t • Force of mortality • ~ Pr{event in (t,t+dt)}/dt • Need not be between 0 and 1 because it is per unit time • h(t) ~ {S(t)-S(t+dt)}/{S(t) dt}
Hazard Function • Basic idea: Live your life one interval (day, month, or year) at a time • Example: S(3) = Pr(survive for 3 months) = Pr(survive 1st month) × Pr(survive 2nd month | survive 1st month) × Pr(survive 3rd month | survive 2nd month) • Thus, = Pr(survive for 1st month & 2nd & 3rd)
Estimating the Survival Function: Kaplan-Meier Method Pr(survive past 5) = Pr(survive past 5|survive past 4) *Pr(survive past 4) [ = Pr(survive past 5 and survive past 4) ]
Notes on Estimating Survival Function • Estimate only changes in intervals where an event occurs • Censored observations contribute to denominators, but never to numerators • Intervals are arbitrary; want narrow ones • Kaplan-Meier estimate results from using infinitesimal interval widths
Acute Myelogenous Leukemia Example Data: 5,5,8,8,12,16+, 23, 27, 30+, 33, 43,45 5 5 8 8 12 16+ 23 27 30+ 33 43 45
Comparing Survival Functions • Suppose we want to test the hypothesis that two survival curves, S1(t) and S2(t) are the same • Common approach is the “log-rank” test • It is effective when we can assume the hazard rates in the two groups are roughly proportional over time
Logrank test: “Drug trial” data Logrank: 1.72 p-value: .19 Conclusion: We lack strong support for a drug effect on survival
Comparing Survival Functions • Suppose we want to test the hypothesis that two survival curves, S1(t) and S2(t) are the same • Common approach is the “log-rank” test • It is effective when we can assume the hazard rates in the two groups are roughly proportional over time • Regression analysis—“Cox” model: more to come
Regression Analysis for Times to Events • Cox proportional hazards model • Hazard of an event is the product of two terms • Baseline hazard, h(t), that depends on time, t • Relative risk, rr(x) that depends on predictor variables, x, but not time • Each person’s hazard varies over time in the same way, but can be higher or lower depending on their predictor variables x
Cox Proportional Hazards Model • (t,x) = hazard for people at risk with predictor values x = (x1,x2, …..xp) • (t,x) = • ln[(t,x)] =
Cox Proportional Hazards Model • Relative Hazard (hazard ratio) interpretation of the ’s = relative risk for one unit difference in x1 with same values for x2, …. xp (at any fixed time t)
Cox Proportional Hazards Model • Proportional hazards over time: x1 =1 (t;x) x1 =0 0 t
Main Points Once Again • Time to event data can be censored because every person does not necessarily have the event during the study • Think of time to event as a natural history, that is 0 before the event and then switches to 1 when the event occurs; analysis counts the events • Survival function, S(t), is the probability a person’s event occurs after each time t
Main Points Once Again • Kaplan-Meier estimator of the survival function is a product of interval-specific survival probabilities • Hazard function, h(t), is the risk per unit time of having the event for a person who is at risk (not previously had event) • Logrank tests evaluate differences among survival in population subgroups • Cox model used for regression for survival data