1 / 45

EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 18, 2014

EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 18, 2014. Dr. N. Birkett, Department of Epidemiology & Community Medicine, University of Ottawa. Objectives. Review proportional hazards Introduce Cox model and methods of estimation Tied data.

hedy
Download Presentation

EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 18, 2014

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EPI 5344:Survival Analysis in EpidemiologyCox regression: IntroductionMarch 18, 2014 Dr. N. Birkett, Department of Epidemiology & Community Medicine, University of Ottawa

  2. Objectives • Review proportional hazards • Introduce Cox model and methods of estimation • Tied data

  3. Exponential model (1R) • Exponential model • Most common parametric model in epidemiology • Assumes a constant h(t) = λ • How did we create the likelihood function? • Subjects can have two types of ‘ends’ • Death • Censored • Each contribute to the likelihood function but in different ways

  4. Exponential Model (2R) • Likelihood contribution of a death at time ti: • Likelihood contribution if censored at time : • Actual time of ‘failure’ is unknown. • Must survive until at least time • Multiply these across all deaths and all censored events to get full likelihood

  5. Exponential Model (3R) Where: N = # events PT = Person-time of follow-up

  6. Exponential Model (4R) • How do we find the MLE for λ?

  7. Exponential Model (5R) • What if we want to examine predictors of the outcome? • λ is allowed to vary by sex, age, cholesterol, etc. • Use the same approach but now, instead of ‘λ’, we have the following in the likelihood function:

  8. End of review

  9. Proportional hazard models (1) • Now, use this approach BUT do not pre-specify form for h(t) • We start with proportional hazards • Hazard (h(t)) = rate of change in survival conditional on having survived to that point in time.

  10. Hazard models (2) • Suppose we want to compare two treatment groups • Different survival is expected they have different hazards • How can we summarize this? In general, HR(t) will be different at different follow-up times

  11. h2(t) h1(t) This is hard to describe and interpret • Effect of the treatment varies with length of follow-up

  12. HR could switch from below to above 1.0 h2(t) h1(t)

  13. Hazard models (3) • SUPPOSE that HR(t) were constant at all follow-up times. • Effect of the treatment is the same at all times PROPORTIONAL HAZARDS model (PH) This does not require that h(t) be constant, It can vary in an unconstrained manner.

  14. h2(t) h1(t)

  15. h2(t) h1(t) HR

  16. Cox models (1) • For most of the rest of this course, we will assume a Proportional hazards model: h1(t) = h0(t) * HR • h0(t) is the ‘baseline’ or reference hazard. • Contains all of the time variability of the hazard. • HR is assumed to remain the same for all follow-up time. Constant over follow-up time

  17. Cox models (2) • HR can still be affected by predictor variables • Race • Exposure (low/mid/high) • Sex • Caloric intake • For now, we will assume that these are • measured at baseline (time ‘0’) • remain fixed during follow-up

  18. Cox models (3) • In general, we would have: • Most common model assumes that ln(HR) is a linear function of the predictors. This is similar to the model for logistic regression and linear regression. • NOTE: there is no intercept! • This is ‘subsumed’ into the baseline hazard term h0(t)

  19. Cox models (4) • HR model can be written: • How does the fit into our ‘hazard’ model? Our base model is:

  20. Cox models (5) • This implies: • But, so what? How do we estimate the Betas? • As with exponential model, it appears we need to know the shape of h0(t)

  21. Cox models (6) • COX (1972) SHOWED THAT THIS IS WRONG! • Can estimate the Beta’s without needing to model h0(t) • Semi-parametric model • Based on: • Risk sets • Partial likelihoods • We will skip a lot of math  • Use an intuitive approach • Method relates to approach used with exponential model

  22. Cox models (7) • Start off trying to build a likelihood for the data based on the whole model (with baseline hazard included) • Concentrate on the times when events happened • Similar to the Kaplan-Meier method • S(t) only changes when an event happens • can ignore losses between events • Action happens within Risk Set at the event times. • The theory assumes that only one event happens at any point in time • This is not the ‘real world’ • In theory, time is continuous. • So no two events happen at the same time • We’ll deal with ‘ties’ later on

  23. Cox models (8) • Consider the risk set at time ‘ti’ when an event happens • Each subject in risk set has a probability of being the one having the event • Higher hazard  higher probability • ‘likelihood’ contribution from person ‘j’ in risk set is:

  24. Cox models (9) • Using the definition of conditional probability, this is: • How do we get the numerator and denominator? • The hazard is a measure of how likely an event is to occur for a person • Higher hazards an event is more likely

  25. Cox models (10) • So, we can get:

  26. Cox models (11) Now, because the hazards are proportional, we have:

  27. Cox models (12) • The likelihood contribution from this event (risk set) can be written: Cancel out the h0(t) Which does not depend on h0(t)

  28. Cox models (13) • The final likelihood contribution from this risk set is: • Which does not depend on h0(t)

  29. Cox models (14) • Now, multiply all of the contributions from each risk set (defined when an event occurs) • Produces a Partial Likelihood • Estimate the Betas using MLE. • We can ignore censored times since we are not estimating the actual hazard • Beta’s depend only on the ranking of events, not on the actual event times • Implies that Cox does not give the same estimates as Person-time epidemiology analyses • Standard Cox models do not estimate survival, just relative survival

  30. Let’s consider a simple example. • Three events  three risk sets to consider D C D D C t3 t1 t2

  31. For subject ‘m’, the hazard function is: 1st event. risk set: 1/2/3/4/5 Subject with event: 3 Likelihood contribution:

  32. But, we have: So, likelihood contribution from risk set #1 is:

  33. Extending this to the other risk sets: 2nd event. risk set: 1/2/4 Subject with event: 1 Likelihood contribution: 3nd event. risk set: 4 Subject with event: 4 Likelihood contribution:

  34. Overall Partial Likelihood is: This can easily be extended to very large data sets. • Writing out the entire partial likelihood function would be ‘crazy’  • But, this is what our computer has to do

  35. Suppose that we are using the Cox model. Let’s also limit to one predictor. Then, we have: • Partial Likelihood form is now: • We will see this layout again

  36. ‘Ties’ (1) • Above assumed that only one event happened at any given time • True ‘in theory’ because time is a continuous variable. • No true in reality because time is measured ‘coarsely’. • For example • Only get measurement data every year • Time of event measured to the day, not hour/min/second • More than one event at the same time is called a ‘tied’ event. • How do we modify the method to handle tied event times?

  37. ‘Ties’ (2) • Two main approaches to ‘ties’ • Exact method • Often implemented using an approximation. • Discrete models • Change the basic theory underlying the model • Assumes that event times are discrete points • Relates to logistic regression • Useful when event time can only occur at fixed points • graduation from high school

  38. ‘Ties’ (3) • Exact method • Suppose we have two events (s1 & s2) which occur at the same time due to imprecise measurement of the event time. • IF we had been able to measure the event time with enough precision, we would know if s1 occurred first or second • Birth of twins • We don’t know, so we assume that the two possibilities are equally likely.

  39. ‘Ties’ (4) • Suppose s1 occurred before s2. • Likelihood contribution would be: • Suppose s2 occurred before s1. • Likelihood contribution would be:

  40. ‘Ties’ (5) • Don’t know order. Each is equally likely. • Overall likelihood contribution is:

  41. ‘Ties’ (6) • A bit messy but not too bad. • However, consider the recidivism data. • 5 arrests occurred in week 8 • We don’t know which order they occurred in • 120 potential orders (=5!) • Each order contributes a likelihood product with 5 terms • Need to add up 120 of these products to give ONE contribution. • Can rapidly get even worse!

  42. ‘Ties’ (7) • Computationally demanding • Not that big a task for modern computers • Two approximate methods have been developed • Breslow • Efron • Both are ‘OK’ as long as number of ties isn’t too big • Efron is better. • With modern computers, using the exact approach is likely fine.

  43. ‘Ties’ (8): Summary SAS default method is Breslow

More Related