1.15k likes | 1.94k Views
Survival Analysis in SAS. UCLA IDRE Statistical Consulting Group Winter 2014. 1. Introduction. Survival analysis models factors that influence the time to an event . Or time to “failure” Shouldn’t use linear regression Non-normally distributed outcome Censoring.
E N D
Survival Analysis in SAS UCLA IDRE Statistical Consulting Group Winter 2014
1. Introduction • Survival analysis models factors that influence the time to an event. • Or time to “failure” • Shouldn’t use linear regression • Non-normally distributed outcome • Censoring
1. proclifetest and procphreg • proclifetest • Non-parametric methods • procphreg • Cox proportional hazards regression • Still very popular survival analysis method
1.1. Sample dataset • Worcester Heart Attack Study • WHAS500 • Examines factors that affect time to death after heart attack • 500 subjects • Many observations are right-censored • Loss to follow up • Do not know when they died, but do know at least how long they lived
1.1. Sample dataset • Important variables • lenfol: length of followup, the outcome • fstat: the censoring variable, loss to followup=0, death=1 • age: age at hospitalization • bmi: body mass index • hr: initial heart rate • gender: males=0, females=1
2.1. Data preparation – Structure • Two ways SAS accepts survival data • One row per subject • proclifetest and procphreg both accept • One variable for time to event One censoring variable • Optional if everyone fails • Explanatory variables fixed
2.1. Data preparation – Structure • Multiple rows per subject • Only procphreg accepts • Follow up time partitioned into intervals • Start and stop variables define endpoints • One variable defines count of events in each interval • 0 or 1 in Cox model – same as censoring • Explanatory variables may vary within subject
2.2. Explore variables with procunivariate • procunivariate for individual variable summaries • Means and variances • Histograms • frequencies
2.2. Explore relationships with proccorr • Bivariate relationships proccorrdata = whas500 plots(maxpoints=none)=matrix(histogram) noprob; varlenfol gender age bmihr; run;
3. Nonparametric survival analysis • Nonparametric methods used to get an idea of overall survival • Descriptive and exploratory • Make no assumptions about shape of survival function • But do not estimate magnitude of covariate effects either
3.1. The Kaplan Meier estimate of the survival function • Kaplan-Meier survival function estimates probability of survival up to each event time in data from the beginning of follow up time • Unconditional survival probability • Basically calculated by multiplying proportion of those at risk who survive each interval up to interval of interest • S(3rd interval) = prop_surv(1st) x prop_surv(2nd) x prop_surv(3rd)
3.1.2. K-M estimates from proclifetest • Requires use of time statement • Specify event time variable at minimum • Censoring variable also if any censoring • Put values representing censoring in () • atrisk option add Number at Risk to output table proclifetestdata=whas500 atrisk outs=outwhas500; timelenfol*fstat(0); run;
3.1.2 K-M survival table Time interval: 0 days to 1 day
3.1.2 K-M survival table 500 at beginning, 8 died in this interval
3.1.2 K-M survival table Survival probability from beginning to this time point
3.1.2. K-M survival, later times Censored observations Notice Number at Risk vs Observed Events
3.1.2. K-M survival, later times Survival estimate changes just alittle, even though many leave study – Censoring doesn’t change survival
3.1.3 Graphing K-M estimate • proclifetest produces graph of overall survival by default • We add plot=survival(cb) to get confidence bands proclifetestdata=whas500 atrisk plots=survival(cb) outs=outwhas500; timelenfol*fstat(0); run;
3.1.3 Graphing K-M estimate Step function drops when someone dies Last few observations are censored, Tick marks are censored observations
3.2. Nelson-Aalen estimator of cumulative hazard function • Calculated as sum of proportion of those at risk who died in each interval up to time t • H(3rd interval) = prop_died(1st) + prop_died(2nd) + prop_died(3rd) • Interpreted as expected number of events from beginning to time t • Has a simple relationship with survival • S(t) = exp(-H(t))
3.2. Nelson-Aalen from proclifetetst • If you add “nelson” to proclifetest statement, SAS will add Nelson-Aalen cumulative hazard to survival table proclifetestdata=whas500 atrisknelson; timelenfol*fstat(0); run;
3.3. Median and mean survival times from proclifetest • proclifetest will produce estimates of mean and median survival time by default • Because distribution is very skewed, median often better indicator of “middle”
3.4. Comparing survival functions using nonparametric methods • We can specify a covariate on the “strata” statement to compare survival between levels of the covariate • SAS will produce graph and tests of comparison proclifetestdata=whas500 atrisk plots=survival(atriskcb) outs=outwhas500; strata gender; timelenfol*fstat(0); run;
3.4. Gender appears to affect survival • 3 chi-squared based tests of equality • 2 differ in how much they weight each interval • Log rank = all intervals equal • Wilcoxon = earlier times more weight
4. Estimating the hazard function • With nonparametrics, we model the survival (or cumulative hazard) function • With regression methods, we look at the hazard function, h(t) • Hazard function gives instantaneous rate of failure at time t • Hazard function should be strictly nonnegative • Another reason we shouldn’t use linear regression
4.1. The Cox proportional hazards model - parameterization • The Cox model is typically parameterized like so: • where • = baseline hazard function of reference group • = predictors in the model • = regression coefficients • always evaluates to positive, which ensures hazard rate is always positive (or 0) • Nice!!
4.2. Predictor effects are expressed as hazard ratios • We can express the change in the hazard rate as a hazard ratio (HR). • Imagine changes to
4.2. Predictor effects on the hazard rate are multiplicative • Predictor effects on the hazard rate are multiplicative (as in logistic regression), not additive (as in linear regression) • They are additive on the log hazard rate
4.2. The baseline hazard rate • Notice that the baseline hazard rate cancels out of the hazard ratio formula: • This means we need not make any assumptions about the baseline hazard rate • The fact that we can leave it unspecified is one of the great strengths of the Cox model • Since the baseline hazard is not parameterized, the Cox model is semiparametric
5.1. Fitting a Cox regression model in procphreg • Model statement works similarly to proclifetest (provided one row per subject structure) • A variable for failure time • A variable for censoring (optional if everyone fails) • Numbers in parentheses represent censoring procphregdata = whas500; class gender; modellenfol*fstat(0) = gender age; run;
5.1. phreg output: model fit • Model fit statistics used to compare between models • Lower is generally better • SBC also known as BIC
5.1. phreg output: test vs null model • Tests overall fit of model vs model with no predictors • Null model assumes everyone has same hazard rate • Prefer likelihood ratio in small samples
5.1. phreg output: test vs null model • Regression table of coefficients • Also includes exponentiated coefficients as hazard ratios
5.2. Graphs of survival function after Cox regression • When “plots=survival” is specified on procphreg statement, SAS will produce survival curve for “reference” group • At reference group for all categorical covariates • At mean of all continuous covariates procphregdata = whas500 plots=survival; class gender; modellenfol*fstat(0) = gender age; run;
5.2. Graphs of survival function after Cox regression Survival curve for males, age 69.85
5.2.1. Use the baseline statement to generate survival plots by group • To get comparison curves, need to use baseline statement in procphreg • A little tricky to use at first • First create dataset of covariate values for which we would like survival curves • Specify this dataset on “covariates=“ option on baseline statement • Use group= option to get separate graphs by group • Use rowid = option to get separate lines by group • Additionally need to specify “plots(overlay=group)” for separate graphs (with possibly separate lines per graph), or at least “plots(overlay)” to get separate lines
5.2.1. Use the baseline statement to generate survival plots by group datacovs; format gender gender.; input gender age; datalines; 0 69.845947 1 69.845947 ; run; procphregdata = whas500 plots(overlay)=(survival); class gender; modellenfol*fstat(0) = gender age; baselinecovariates=covsout=base / rowid=gender; run;
5.2.1. Use the baseline statement to generate survival plots by group Effect of gender not significant after accounting for age
5.3 Interactions in Cox model • Interactions allow the effect of one covariate to vary with the level of another • Perhaps effect of age differs by gender • Can model interactions between terms using “|” • Can also model quadratic (and higher order) effects this way • We suspect bmi has quadratic effect (because high and low ends tend to have worse health outcomes)
5.3 Interactions in Cox model procphregdata = whas500; class gender; modellenfol*fstat(0) = gender|agebmi|bmi hr; run;
5.3 Interactions in Cox model • Model fit statistics have improved with addition of interactions
5.3 Interactions in Cox model • Interactions are significant • Age effect depends on gender • Less “severe” for females (negative interaction term) • The effect of bmi depends on what the bmi score is • Begins negative (at bmi=0, an unrealistic value) and flattens as bmi rises
5.4. Use hazardratio statements and graphs to interpret effects • hazardratio statement is useful for interpreting interactions • Hazard ratios not printed in regression table for terms involved in interactions • Because hazard ratio changes with level of interacting covariate • Useful for interpreting effects of categorical and continuous predictors and their interactions
5.4. Use hazardratio statements and graphs to interpret effects • Syntax of hazardratio statement • After keyword hazardratio, you can apply an optional label to the hazard ratio • Then the variable whose levels we would like to compare in the hazard ratio • For interactions, after “/”, use option “at=“ to specify level of interacting covariate