Dr. N. Birkett, Department of Epidemiology & Community Medicine, University of Ottawa

EPI 5344:Survival Analysis in EpidemiologySurvival curve comparison(non-regression methods)March 4, 2014 Dr. N. Birkett, Department of Epidemiology & Community Medicine, University of Ottawa

Comparing survival (1) • A common RCT question: • Did the treatment make a difference to the rate of outcome development? • A more general question: • Which treatment, exposure group, etc. has the best outcome • lowest mortality, lowest incidence, best recovery • Can be addressed through: • Regression methods • Cox models (later) • Non-regression methods • Log-rank test • Mantel-Hanzel • Wilcoxon/Gehan

dur status treat renal 8 1 1 1 180 1 2 0 632 1 2 0 852 0 1 0 52 1 1 1 2240 0 2 0 220 1 1 0 63 1 1 1 195 1 2 0 76 1 2 0 70 1 2 0 8 1 1 0 13 1 2 1 1990 0 2 0 1976 0 1 0 18 1 2 1 700 1 2 0 1296 0 1 0 1460 0 1 0 210 1 2 0 63 1 1 1 1328 0 1 0 1296 1 2 0 365 0 1 0 23 1 2 1 Data for the Myelomatous data set, Allison Does treatment affect survival?

Rank order the data within treatment groups Treatment = 1 dur status treat renal 8 1 1 1 8 1 1 0 52 1 1 1 63 1 1 1 63 1 1 1 220 1 1 0 852 0 1 0 365 0 1 0 1296 0 1 0 1328 0 1 0 1460 0 1 0 1976 0 1 0 Treatment = 2 dur status treat renal 13 1 2 1 23 1 2 1 18 1 2 1 70 1 2 0 76 1 2 0 180 1 2 0 195 1 2 0 210 1 2 0 632 1 2 0 700 1 2 0 1296 1 2 0 1990 0 2 0 2240 0 2 0

Effect of new treatment New Rx Old Rx

Effect of pre-existing renal disease No renal disease Renal disease

Comparing Survival (2) • How to tell if one group has better survival? • One approach is to compare survival at one point in time • One year survival • Five year survival • This is the approach used with Cumulative Incidence Ratios (RR’s).

Δ Compare the cumulative incidence (1-S(t)) at 5 years using a t-test, etc.

This approach is limited: • For both of these situations, the five-year survival is the same for the two groups being compared. • BUT, the overall pattern of survival in the study on the left is clearly different between the two groups while for the study on the right, it is not.

Comparing Survival (3) • Compare curves at each point • Combine across all events • Can limit comparison to times when an event happens ti

Comparing survival (4) • Risk Set • All people under study at time of event • Only include people at risk of having an event Risk set #1 Risk set #2 D C Risk set #3 D D C

Comparing Survival (5) • Nonparametric approaches • Log-rank • Mantel-Hanzel • Wilcoxon/Gehan • Other weighted methods (a wide variety exist) • Closely related but not the same • ‘Log rank’ is often presented as the Mantel-Hanzel (M-H) method without explanation • They differ in their assumptions. • We will use the M-H approach

Comparing Survival (6) • General approach • Suppose the 2 groups have the same survival • whenever an event happens, everyone in the risk set has the same probability of being the person having event • Combine all observations into one file • Rank order them on the time-to-event • At each event time, compute a statistic to compare the expected number of events in group 1 (or 2) to the observed number • Combine the results at each time point into a summary statistic • Compare the statistic to an appropriate reference standard.

Comparing Survival (7) • Example from Cantor • We present the merged and sorted data in the table on the next slide.

di= # events in group ‘I’; Ri= # members of risk set at ‘ti’

Comparing Survival (8) • Consider first event time (t=2). • In the risk set at t=2, we have: • 5 subjects in group 1 • 6 subjects in group 2 • We can represent this data as a 2x2 table.

Comparing Survival (8) • What are the ‘E’ and ‘V’ columns? • Ei,t = expected # of events in group ‘i’ at time ‘t’ • Vt = Approximate variance of ‘E’ at time ‘t’

Comparing Survival (9)

Comparing Survival (10) • More generally, suppose we have: • dt1= # events at time ‘t’in group 1 • dt2= # events at time ‘t’in group 2 • dt+= # events at time ‘t’ (dt1+dt2) • Rt1= # in risk set at time ‘t’in group 1 • Rt2= # in risk set at time ‘t’in group 2 • Rt+= # in risk set at time ‘t’ (Rt1+Rt2) • Then, we have the expected # of events in group 1 is:

Comparing Survival (11) • dt1= # events at time ‘t’in group 1 • dt2= # events at time ‘t’in group 2 • dt+= # events at time ‘t’ (dt1+dt2) • Rt1= # in risk set at time ‘t’in group 1 • Rt2= # in risk set at time ‘t’in group 2 • Rt+= # in risk set at time ‘t’ (Rt1+Rt2) • And, the ‘V’s are given by this formula:

At time ‘2’ At time ‘3’

At time ‘5’ At time ‘t’

Comparing Survival (12) • Compute O1t– E1tfor each event time ‘t’ • Add up the differences across all events to get: • This measures how far group ‘1’differs from what would be expected if survival were the same in the two groups. • If you had chosen group ‘2’ instead of group ‘1’, the sum of the differences would have been the same.

Comparing Survival (13) • Write this difference as: O+– E+ • And, let • Then, we have: This is the log rank test

Comparing Survival (14) • The test above essentially applies the Mantel-Hanzel test (covered in Epi 1) to tables created by stratifying the sample into groups based on the event times. • The test can be written as: Log-rank or Mantel-Hanzel test

Comparing Survival (15) • The test can be modified by assigning weights to each time point. • Might be based on size of set set at ‘t’ • Then, the test becomes:

Comparing Survival (16) • Log-rank: • wt=1 • equally weights events at all points in time • Wilcoxon test • wt=Rt+ • Weight is the size of the Risk Set at time ‘t’ • Assigns more weight to early events than late events • large risk sets  more precise estimates • Other variants exist • These tests don’t give the same results.

Comparing Survival (17) • Some Issues • More than 2 groups • Method can be extended • Continuous Predictors • Must categorize into groups • Multiple predictors • Cross-stratify the predictors • Limited # of variables which can be included

Comparing Survival (18) • Some Issues • Curves which cross • THERE IS NO RIGHT ANSWER!!! • Which is ‘better’ depends on the follow-up time • Relates to effect modification • How to weight early/late events • Many different approaches • Wilcoxon gives more weight to early events • Can give different answers, especially if p-values are close to 0.05

Dr. N. Birkett, Department of Epidemiology & Community Medicine, University of Ottawa