Dr. N. Birkett, Department of Epidemiology & Community Medicine, University of Ottawa

EPI 5344:Survival Analysis in EpidemiologyIntroduction to concepts and basic methodsFebruary 25, 2014 Dr. N. Birkett, Department of Epidemiology & Community Medicine, University of Ottawa

Survival concepts (1) • Cohort studies • Follow-up a pre-defined group of people for a period of time which can be: • Same time for everyone • Different time for different people. • Determine which people achieve specified outcome. • Outcomes could be many different things, such as: • Death • Any cause or cause-specific • Onset of new disease • Resumption of smoking in someone who had quit • Recidivism for drug use or criminal activity • Change in numerical measure such as blood pressure • Longitudinal data analysis

Survival concepts (2) • Cohort studies • Traditional approach to cohorts assumes everyone is followed for the same time • incidence proportion • logistic regression modeling • If follow-up time varies, what do you do with subjects who don’t make it to the end of the study? • Censoring • Cohort studies can provide more information than presence/absence of outcome. • Time when outcome occurred • Type of outcome (competing outcomes) • Can look at rate or speed of development of outcome • incidence rate • person-time

Survival concepts (3) • Time to event analysis • Survival Analysis (general term) • Life tables • Kaplan-Meier curves • Actuarial methods • Log-rank test • Cox modeling (proportional hazards) • Strong link to engineering • Failure time studies

Survival concepts (4) • Analysis of Cohort studies (from epidemiology) • Incidence proportion (cumulative incidence) • Select a point in time as the end of follow-up. • Compare groups using t-test, CIR (RR) • Issues include: • What point in time to use? • What if not all subjects remain under follow-up that long? • Ignores information from subjects who don’t get outcome or reach the time point • What is incidence proportion for the outcome ‘death’ if we set the follow-up time to 200 years? • Will always be 100%

Survival concepts (5) • Analysis of Cohort studies (from epidemiology) • Incidence rate (density) • Based on person time of follow-up • Can include information on drop-outs, etc. • Closely linked to survival analysis methods

Survival concepts (6) • Cumulative Incidence • The probability of becoming ill over a pre-defined period of time. • No units • Range 0-1 • Incidence density (rate) • The rate at which people get ill during person-time of follow-up • Units: 1/time or cases/Person-time • Range 0 to +∞ • Very closely related to hazard rate.

Measuring Time (1) • Need to consider: • Units to use to measure time • Normally, years/months/days • Time of events is usually measured as ‘calendar time’ • Other measures are possible (e.g. hours) • ‘scale’ to be used • time on study • age • calendar date • Time ‘0’ (‘origin of time’) • The point when time starts

Time Scale (1) • Time of events is usually measured as ‘calendar time’ • Can be represented by ‘time lines’ in a graph • Conceptual idea used in analyses

D C C D

Time Scale (2) • In survival analysis, focus is commonly on ‘study time’ • How long after a patient starts follow-up do their events occur? • Particularly common choice for RCT’s • Need to define a ‘time 0’ or the point when study time starts accumulating for each patient. • Most epidemiologists recommend using ‘age’ as the time scale for etiological studies • We’ll focus on time since a defining event but, remember this for the future.

Origin of Time (1) • Choice of time ‘0’ affects analysis • can produce very different regression coefficients and model fit; • Preferred origin is often unavailable • More than one origin may make sense • no clear criterion to choose which to use

Time ‘0’ (2) • No best time ‘0’ for all situations • Depends on study objectives and design • RCT of Rx • ‘0’ = date of randomization • Prognostic study • ‘0’ = date of disease onset • Inception cohort • Often use: date of disease diagnosis

Time ‘0’ (3) • ‘point source’ exposure • Date of event • Hiroshima atomic bomb • Dioxin spill, Seveso, Italy

Time ‘0’ (4) • Chronic exposure • date of study entry • Date of first exposure • Age (preferred origin/time scale) • Issues • There often is no first exposure (or no clear date of 1st exposure) • Recruitment long after 1st exposure • Immortal person time • Lack of info on early events. • ‘Attained age’ as time scale

Time ‘0’ (5) • Calendar time can be very important • studies of incidence/mortality trends • In survival analysis, focus is on ‘study time’ • When after a patient starts follow-up do their events occur • Need to change time lines to reflect new time scale

D C C D

Study course for patients in cohort 2003 2001 2013

Time ‘0’ (5) • Can be interested in more than one ‘event’ and thus more than one ‘time to event’ • An Example • Patients treated for malignant melanoma • Treated with ‘A’ or ‘B’ • Expected to influence both time to relapse and survival

Time ‘0’ (6) • Some studies have more than one outcome event • Let’s use this to illustrate SAS code to compute time-to-event. • Four time points: • Date of surgery: Time ‘0’ • Relapse • Death • Last follow-up (if still alive without relapse.) • Event #1: earliest of relapse/death/end • Event #2: Earliest of death/end

Time ‘0’ • How do we compute the ‘time on study’ for each of these events? • Convert to days (weeks, months, years) from time ‘0’for each person • SAS reads date data using ‘date format’ • stored as # days since Jan 1, 1960.

SAS code to create event variables Data melanoma; set melanoma; /* dfs -> Died or relapsed */ dfsevent = 1 – (date_of_relapse = .)*(date_of_death = .); /* surv -> Alive at the end of follow-up */ survevent = (date_of_death ne .); if (survevent = 0) then survtime = (date_of_last – date_of_surg)/30.4; else survtime = (date_of_death – date_of_surg)/30.4; if (dfsevent = 0) then dfstime = (date_of_last - date_of_surg)/30.4; else if (date_of_relapse NE .) then dfstime = (date_of_relapse - date_of_surg)/30.4; else if (date_of_relapse = . and date_of_death NE .) then dfstime = (date_of_death - date_of_surg)/30.4; else dfstime = .E; Run;

Survival curve (1) • What can we do with data which includes time-to-event? • Might be nice to see a picture of the number of people surviving from the start to the end of follow-up.

Sample Data: Mortality, no losses

Not the right axis for a survival curve

Survival curve (2) • Previous graph has a problem • What if some people were lost to follow-up? • Plotting the number of people still alive would effectively say that the lost people had all died.

Sample Data: Mortality, no losses

Survival curve (2) • Previous graph has a problem • What if some people were lost to follow-up? • Plotting the number of people still alive would effectively say that the lost people had all died. • Instead • True survival curve plots the probability of surviving.

Survival Curves (1) • Primary outcome is ‘time to event’ • Also need to know ‘type of event’

Survival Curves (2) • Censored • People who do not have the targeted outcome (e.g. death) • For now, assume no censoring • How do we represent the ‘time’ data in a statistical method? • Histogram of death times - f(t) • Survival curve - S(t) • Hazard curve - h(t) • To know one is to know them all

Histogram of death time • Skewed to right • pdf or f(t) • CDF or F(t) • Area under ‘pdf’ from ‘0’ to ‘t’ F(t) t

Survival curves (3) • Plot % of group still alive (or % dead) S(t) = survival curve = % still surviving at time ‘t’ = P(survive to time ‘t’) Mortality rate = 1 – S(t) = F(t) = Cumulative incidence

Survival S(t) 1-S(t) S(t) Deaths CI(t) t

‘Rate’ of dying • Consider these 2 survival curves • Which has the better survival profile? • Both have S(3) = 0

Survival curves (4) • Most people would prefer to be in group‘A’ than group ‘B’. • Death rate is lower in first two years. • Will live longer than in pop ‘B’ • Concept is called: • Hazard: Survival analysis/stats • Force of mortality: Demography • Incidence rate/density: Epidemiology • DEFINITION • h(t) = rate of dying at time ‘t’GIVEN that you have survived to time ‘t’ • Similar to asking the speed of your car given that you are two hours into a five hour trip from Ottawa to Toronto • Slight detour and then back to main theme

Survival Curves (5) Conditional Probability h(t0) = rate of failing at ‘t0’ conditional on surviving to t0 Requires the ‘conditional survival curve’: Essentially, you are re-scaling S(t) so that S*(t0) = 1.0

S(t0) t0 t0

S*(t) = survival curve conditional on surviving to ‘t0‘ CI*(t) = failure/death/cumulative incidence at ‘t’ conditional on surviving to ‘t0‘ Hazard at t0 is defined as: ‘the slope of CI*(t) at t0’ Hazard (instantaneous) Force of Mortality Incidence rate Incidence density Range: 0 ∞

Some relationships If the rate of disease is small: CI(t) ≈ H(t) If we assume h(t) is constant (= ID): CI(t)≈ID*t

Some survival functions (1) • Exponential • h(t) = λ • S(t) = exp (- λt) • Underlies most of the ‘standard’ epidemiological formulae. • Assumes that the hazard is constant over time • Big assumption which is not usually true

Some survival functions (2) • Weibull • h(t) = λγ tγ-1 • S(t) = exp (- λ tγ) • Allows fitting a broader range of hazard functions • Assumes hazard is monotonic • Always increasing (or decreasing)

Dr. N. Birkett, Department of Epidemiology & Community Medicine, University of Ottawa