490 likes | 843 Views
EPI 5344: Survival Analysis in Epidemiology Actuarial and Kaplan-Meier methods February 25, 2014. Dr. N. Birkett, Department of Epidemiology & Community Medicine, University of Ottawa. Analyzing Survival D ata(1). Three methods to analyze survival data: Parametric methods
E N D
EPI 5344:Survival Analysis in EpidemiologyActuarial and Kaplan-Meier methodsFebruary 25, 2014 Dr. N. Birkett, Department of Epidemiology & Community Medicine, University of Ottawa
Analyzing Survival Data(1) • Three methods to analyze survival data: • Parametric methods • non-parametric methods • semi-parametric methods
Analyzing Survival Data (2) • Parametric methods • Assume one of the functions discussed earlier. • Usually assume the probability distribution • Estimating the hazard function is key • Estimate the parameters directly • Use Maximum Likelihood Estimation methods • Has greatest statistical power provided that the model is correct • Will be discussed in a later session
Analyzing Survival Data (3) • Non-Parametric methods • Make no assumption about the survival curve, distribution function, etc. • Common approach used in epidemiology and medicine. • Actuarial method (life-table/Cutler-Ederer) • Treat time as ‘intervals’ • Doesn’t need exact time of the event • Used for 100+ years by demographers • Kaplan-Meier (product-limit) method • Most frequent approach for RCTs • Requires knowing the actual time of event
Analyzing Survival Data (4) • Semi-Parametric methods (doesn’t ‘estimate’ S(t)) • Assume that there is a parametric relationship between different treatment/exposure groups • E.g. Males have twice the hazard as females • BUT, let the hazard function be unspecified (non-parametric) • Can be any form, including cure models • Cox modeling is most common method used • Proportional Hazard assumption is commonly used but not essential • More later in course • We’ll start with the actuarial method
Actuarial Method: Key Concept • Divide the follow-up period into smaller time units • Often use 1 year intervals • Can be: days, months, decades, etc. • Intervals don’t have to be the same size but usually are • Compute survival in each interval • Combine these into an overall estimate of S(t)
Consider a simple example: • Follow 1000 people for 3 years and count number that die in each year What is Cumulative Incidence over 3 years? Standard Epi formula:
Another view: How can you still be alive after 3 years? • Don’t die in year 1 and • Don’t die in year 2 and • Don’t die in year 3
DEAD p1 DEAD p2 DEAD p3 1-p1 1-p2 1-p3 Year 3 Year 1 Year 2 Year 0
Our simple example: • Apply the formula: Same Answer! Why? No losses/censoring
Cumulative Probs Conditional Probs
Actuarial Method (1) • Consider the first interval of time: • 10,000 people ‘at risk’ at start of interval • 1,500 die • 5,000 are ‘lost’ before end of interval • Is the probability of death: • 1,500/10,000 NO
Actuarial Method (2) • ‘Lost’ people are only at risk of ‘dying’ until they are lost. • When are they lost? • We don’t know. Losses could follow any pattern:
Actuarial Method (3) • The Actuarial ASSUMPTION • ‘lost’ subjects are ‘at risk’ for one-half of the interval • Only one-half of lost subjects are ‘at risk’ for the interval. • For 1990, this implies:
Actuarial Method (4) • This is identical to the standard formula for estimating Cumulative Incidence learned in Epi 1.
Cumulative Probs Conditional Probs
Actuarial Method (5) • Now, consider 1991 (Assume that you survive to the start of 1991) • The standard epidemiology formula gives:
Actuarial Method (6) • What is: Prob(died by 1991)? AND SO ON
Actuarial Method (7)‘The Math’ Compute these for each interval. Gives columns A through G
Actuarial Method (7)‘The Math’ – part 2 The Cumulative probabilities Compute these for each interval. Gives columns H and I
Kaplan Meier: Key Concept • Similar approach to the actuarial method EXCEPT: • Use exact times for each outcome to define ‘intervals’ rather than a fixed length interval • Compute a new survival value at every time where an outcome event occurs • Can ignore times with censored events • Excluded censored people from the ‘at risk’ group
Kaplan Meier: Risk set • At any time ‘t’ during follow-up, there will be a group of people still under active follow-up • Excludes • People with previous outcomes • People who have been censored prior to ‘t’ • These are the only people at risk of having an outcome at time ‘t’ • Called the RISK SET at time ‘t’
Kaplan-Meier: Formulae • Compute at each time when an event happens
Data: 5, 12, 25, 26, 27, 28, 32+, 33+, 34+, 37, 39, 40+, 42+
Confidence Intervals for Si(t) Greenwood’s Formula
Median Survival (1) • Mean survival • hard to estimate and has limited value • due to right skewing of survival distribution • Median survival is more useful: • The time by which 50% of the cohort will have had the outcome • S(t) = 0.50 • With no censoring, easy to get: • Use normal approach with the survival times.
Median Survival (2) • With censoring, you need to solve: • S(t) = 0.5 • You can do this directly in the KM plot • Can be computed as well • Is based on the rank order of the survival times, not on the actual times • Except at the median itself • 95% CI can be obtained • Complex formula/method • Tend to be very wide.
K-M: A Couple Of Notes • If the last time corresponds to an ‘event’, then S(tlast) MUST BE 0. • This does NOT mean that every one in the group dies. • If the last time corresponds to a ‘censoring’, then S(tlast) will be non-zero. • The mean survival time will be biased • Under-estimated