Survival Analysis in SAS

Survival Analysis in SAS UCLA IDRE Statistical Consulting Group Winter 2014

1. Introduction • Survival analysis models factors that influence the time to an event. • Or time to “failure” • Shouldn’t use linear regression • Non-normally distributed outcome • Censoring

1. proclifetest and procphreg • proclifetest • Non-parametric methods • procphreg • Cox proportional hazards regression • Still very popular survival analysis method

1.1. Sample dataset • Worcester Heart Attack Study • WHAS500 • Examines factors that affect time to death after heart attack • 500 subjects • Many observations are right-censored • Loss to follow up • Do not know when they died, but do know at least how long they lived

1.1. Sample dataset • Important variables • lenfol: length of followup, the outcome • fstat: the censoring variable, loss to followup=0, death=1 • age: age at hospitalization • bmi: body mass index • hr: initial heart rate • gender: males=0, females=1

2.1. Data preparation – Structure • Two ways SAS accepts survival data • One row per subject • proclifetest and procphreg both accept • One variable for time to event One censoring variable • Optional if everyone fails • Explanatory variables fixed

2.1. Data preparation – Structure • Multiple rows per subject • Only procphreg accepts • Follow up time partitioned into intervals • Start and stop variables define endpoints • One variable defines count of events in each interval • 0 or 1 in Cox model – same as censoring • Explanatory variables may vary within subject

2.2. Explore variables with procunivariate • procunivariate for individual variable summaries • Means and variances • Histograms • frequencies

2.2. Explore relationships with proccorr • Bivariate relationships proccorrdata = whas500 plots(maxpoints=none)=matrix(histogram) noprob; varlenfol gender age bmihr; run;

2.2. Explore relationships with proccorr

3. Nonparametric survival analysis • Nonparametric methods used to get an idea of overall survival • Descriptive and exploratory • Make no assumptions about shape of survival function • But do not estimate magnitude of covariate effects either

3.1. The Kaplan Meier estimate of the survival function • Kaplan-Meier survival function estimates probability of survival up to each event time in data from the beginning of follow up time • Unconditional survival probability • Basically calculated by multiplying proportion of those at risk who survive each interval up to interval of interest • S(3rd interval) = prop_surv(1st) x prop_surv(2nd) x prop_surv(3rd)

3.1.2. K-M estimates from proclifetest • Requires use of time statement • Specify event time variable at minimum • Censoring variable also if any censoring • Put values representing censoring in () • atrisk option add Number at Risk to output table proclifetestdata=whas500 atrisk outs=outwhas500; timelenfol*fstat(0); run;

3.1.2 K-M survival table

3.1.2 K-M survival table Time interval: 0 days to 1 day

3.1.2 K-M survival table 500 at beginning, 8 died in this interval

3.1.2 K-M survival table Survival probability from beginning to this time point

3.1.2 K-M survival table

3.1.2. K-M survival, later times Censored observations Notice Number at Risk vs Observed Events

3.1.2. K-M survival, later times Survival estimate changes just alittle, even though many leave study – Censoring doesn’t change survival

3.1.3 Graphing K-M estimate • proclifetest produces graph of overall survival by default • We add plot=survival(cb) to get confidence bands proclifetestdata=whas500 atrisk plots=survival(cb) outs=outwhas500; timelenfol*fstat(0); run;

3.1.3 Graphing K-M estimate Step function drops when someone dies Last few observations are censored, Tick marks are censored observations

3.2. Nelson-Aalen estimator of cumulative hazard function • Calculated as sum of proportion of those at risk who died in each interval up to time t • H(3rd interval) = prop_died(1st) + prop_died(2nd) + prop_died(3rd) • Interpreted as expected number of events from beginning to time t • Has a simple relationship with survival • S(t) = exp(-H(t))

3.2. Nelson-Aalen from proclifetetst • If you add “nelson” to proclifetest statement, SAS will add Nelson-Aalen cumulative hazard to survival table proclifetestdata=whas500 atrisknelson; timelenfol*fstat(0); run;

3.2. Nelson Aalen in survival table

3.3. Median and mean survival times from proclifetest • proclifetest will produce estimates of mean and median survival time by default • Because distribution is very skewed, median often better indicator of “middle”

3.4. Comparing survival functions using nonparametric methods • We can specify a covariate on the “strata” statement to compare survival between levels of the covariate • SAS will produce graph and tests of comparison proclifetestdata=whas500 atrisk plots=survival(atriskcb) outs=outwhas500; strata gender; timelenfol*fstat(0); run;

3.4. Gender appears to affect survival

3.4. Gender appears to affect survival • 3 chi-squared based tests of equality • 2 differ in how much they weight each interval • Log rank = all intervals equal • Wilcoxon = earlier times more weight

4. Estimating the hazard function • With nonparametrics, we model the survival (or cumulative hazard) function • With regression methods, we look at the hazard function, h(t) • Hazard function gives instantaneous rate of failure at time t • Hazard function should be strictly nonnegative • Another reason we shouldn’t use linear regression

4.1. The Cox proportional hazards model - parameterization • The Cox model is typically parameterized like so: • where • = baseline hazard function of reference group • = predictors in the model • = regression coefficients • always evaluates to positive, which ensures hazard rate is always positive (or 0) • Nice!!

4.2. Predictor effects are expressed as hazard ratios • We can express the change in the hazard rate as a hazard ratio (HR). • Imagine changes to

4.2. Predictor effects on the hazard rate are multiplicative • Predictor effects on the hazard rate are multiplicative (as in logistic regression), not additive (as in linear regression) • They are additive on the log hazard rate

4.2. The baseline hazard rate • Notice that the baseline hazard rate cancels out of the hazard ratio formula: • This means we need not make any assumptions about the baseline hazard rate • The fact that we can leave it unspecified is one of the great strengths of the Cox model • Since the baseline hazard is not parameterized, the Cox model is semiparametric

5.1. Fitting a Cox regression model in procphreg • Model statement works similarly to proclifetest (provided one row per subject structure) • A variable for failure time • A variable for censoring (optional if everyone fails) • Numbers in parentheses represent censoring procphregdata = whas500; class gender; modellenfol*fstat(0) = gender age; run;

5.1. phreg output: model fit • Model fit statistics used to compare between models • Lower is generally better • SBC also known as BIC

5.1. phreg output: test vs null model • Tests overall fit of model vs model with no predictors • Null model assumes everyone has same hazard rate • Prefer likelihood ratio in small samples

5.1. phreg output: test vs null model • Regression table of coefficients • Also includes exponentiated coefficients as hazard ratios

5.2. Graphs of survival function after Cox regression • When “plots=survival” is specified on procphreg statement, SAS will produce survival curve for “reference” group • At reference group for all categorical covariates • At mean of all continuous covariates procphregdata = whas500 plots=survival; class gender; modellenfol*fstat(0) = gender age; run;

5.2. Graphs of survival function after Cox regression Survival curve for males, age 69.85

5.2.1. Use the baseline statement to generate survival plots by group • To get comparison curves, need to use baseline statement in procphreg • A little tricky to use at first • First create dataset of covariate values for which we would like survival curves • Specify this dataset on “covariates=“ option on baseline statement • Use group= option to get separate graphs by group • Use rowid = option to get separate lines by group • Additionally need to specify “plots(overlay=group)” for separate graphs (with possibly separate lines per graph), or at least “plots(overlay)” to get separate lines

5.2.1. Use the baseline statement to generate survival plots by group datacovs; format gender gender.; input gender age; datalines; 0 69.845947 1 69.845947 ; run; procphregdata = whas500 plots(overlay)=(survival); class gender; modellenfol*fstat(0) = gender age; baselinecovariates=covsout=base / rowid=gender; run;

5.2.1. Use the baseline statement to generate survival plots by group Effect of gender not significant after accounting for age

5.3 Interactions in Cox model • Interactions allow the effect of one covariate to vary with the level of another • Perhaps effect of age differs by gender • Can model interactions between terms using “|” • Can also model quadratic (and higher order) effects this way • We suspect bmi has quadratic effect (because high and low ends tend to have worse health outcomes)

5.3 Interactions in Cox model procphregdata = whas500; class gender; modellenfol*fstat(0) = gender|agebmi|bmi hr; run;

5.3 Interactions in Cox model • Model fit statistics have improved with addition of interactions

5.3 Interactions in Cox model • Interactions are significant • Age effect depends on gender • Less “severe” for females (negative interaction term) • The effect of bmi depends on what the bmi score is • Begins negative (at bmi=0, an unrealistic value) and flattens as bmi rises

5.3 Interactions in Cox model

5.4. Use hazardratio statements and graphs to interpret effects • hazardratio statement is useful for interpreting interactions • Hazard ratios not printed in regression table for terms involved in interactions • Because hazard ratio changes with level of interacting covariate • Useful for interpreting effects of categorical and continuous predictors and their interactions

5.4. Use hazardratio statements and graphs to interpret effects • Syntax of hazardratio statement • After keyword hazardratio, you can apply an optional label to the hazard ratio • Then the variable whose levels we would like to compare in the hazard ratio • For interactions, after “/”, use option “at=“ to specify level of interacting covariate

Survival Analysis in SAS

Survival Analysis in SAS

Presentation Transcript

Survival Analysis

Survival Analysis

Survival Analysis

Survival analysis with time-varying covariates in SAS

Survival analysis

Survival Models in SAS

Survival Analysis

Survival analysis

EPI 5344: Survival Analysis in Epidemiology SAS code and output February 25, 2013

Survival analysis

Flexible Survival Modeling in SAS

Survival Analysis

Tracker SAS Analysis

SURVIVAL ANALYSIS

Survival Analysis

Survival Analysis

Survival analysis

Survival Analysis

Survival analysis