570 likes | 686 Views
EPI 5344: Survival Analysis in Epidemiology Testing the Proportional Hazard Assumption April 1, 2014. Dr. N. Birkett, Department of Epidemiology & Community Medicine, University of Ottawa. Objectives. Background Residuals in Cox regression The ‘STRATA’ statement in PHreg.
E N D
EPI 5344:Survival Analysis in EpidemiologyTesting the Proportional Hazard AssumptionApril 1, 2014 Dr. N. Birkett, Department of Epidemiology & Community Medicine, University of Ottawa
Objectives • Background • Residuals in Cox regression • The ‘STRATA’ statement in PHreg. • Graphical approaches to PH testing • Model-based approaches to PH testing • The ‘ASSESS’ option in SAS.
Residuals for Cox (1) • Residuals in linear regression measure how far the model deviates from the true data points: • That doesn’t work for Cox because we have no ‘y’ • Two alternative types of residuals are used • Individual • Covariate-wise
Residuals for Cox (2) • Individual Residuals • One residual for each time when an event happens • Three variants • Cox-Snell • Deviance • Martingale • Difference between the observed and expected number of events at each event time • These are not widely used • Will return to them when we discuss ASSESS statement
Residuals for Cox (3) • Schoenfeld Residuals • One residual for each • Covariate at each • Time when at least one event happens • For each subject who has an event at that time point. • Based on the expected value for each of the covariates at every point when an event happens • Computed at each time ‘t’ when an event happens
Residuals for Cox (4) • Schoenfeld Residuals • At each event point, every subject in the risk set has a probability that they would have had an event • Use these probabilities to determine the expected value of each of the covariates at that point in time • The residual for this covariate for each subject having an event at this time point is: • The difference between this expected value and the covariate for the subject who had the event.
Residuals for Cox (5) • Schoenfeld Residuals (cont) • For now, assume only one event at each time point • At event time ‘t’, let subject j* be the one who had the event • Consider any subject ‘j’ who is still in the risk set at ‘t’. From earlier classes, we have:
Residuals for Cox (6) • Schoenfeld Residuals (cont) • Compute the expected value for covariate ‘i’ at time ‘t’: • The Schoenfeld residual for covariate ‘I’at time ‘t’ is:
Residuals for Cox (7) • Note that there is one residual for: • each covariate • at each time point • for the subject having the event • Differs from linear regression which has one residual for: • each subject • If there is more than one event at the time point • Compute one residual for each subject • Expected value is ‘0’ under PH assumption
A worked example True model: ln(HR)=x1+2x2 At time ‘t’: • 3 subjects remain in the risk set • covariates are given in the table • Subject #2 has event Event
Expected value of x1 at ‘t’ is: 0.42*0.3 + 0.31*0.4 + 0.28*0.5 = 0.39 Schoenfeld Residual for x1 at ‘t’ is: 0.4 – 0.39 = 0.01 Expected value of x2 at ‘t’ is: 0.42*0.4 + 0.31*0.2 + 0.28*0.1 = 0.258 Schoenfeld Residual for x2 at ‘t’ is: 0.2 – 0.258 = -0.058
The ‘STRATA’ Statement (1) • Proc Lifetest has a ‘strata’ statement • Used to define two (or more) groups for the log-rank test. • Produces one S(t) curve for each level of the stratification variable. • Plot log(-log(S(t))) vs. ‘t’ to check PH (more later) • Phreg also has STRATA statement • Useful for ‘adjusting’ out variables which do not meet PH assumption but which aren’t of interest to us.
The ‘STRATA’ Statement (2) • Effectively, fits a separate model in each stratum, with a different baseline hazard • Use Baseline to estimate S(t) in each stratum • Plot log(-log(S(t))) vs. ‘t’ in each stratum to check PH
Testing PH (1) • 2graphical methods and 1 modeling approach Graphical method #1 • Plot: log(-log(S(t))) vs. log(t) • Can also plot against just ‘t’ • Consider two groups which satisfy the PH assumption.
But: Take another log of both sides: So, plotting log-log curves of the 2 groups should show curves which are parallel. Can plot against ‘t’ or ‘ln(t)’
Testing PH (2) • How to generate the curves? Method #1 • Use KM method in Proc LIFETEST • use STRATA statement to generate different curves for each level of the predictor • Produces one set of log(-log(S(t))) values for each set of predictor variable • Limitations • Can not adjust for other factors • Hard to use with continuous predictors.
ODS graphics on; proc lifetest data=njb1 plots=(s,ls,lls); time week*arrest(0); strata fin; run; ODS graphics off;
Aid No aid
H(t) No aid Aid
No aid Aid
ODS graphics on; proc lifetest data=njb1 plots=(s,ls,lls); time week*arrest(0); strata age (20,25); run; ODS graphics off;
H(t) <20 20-25 >25
20-25 <20 >25
Testing PH (3) Method #2 • Use Proc PHREG with the STRATA and BASELINE statements • Using ‘baseline’ and ‘strata’ produces an estimate of S(t), ln(S(t)), etc. within each stratum, adjusted for the other variables in the model. • Variable being tested for PH goes into the STRATA statement and not in the model • Plot curves and examine
PROC PHREG DATA=allison.recid; class fin; MODEL week*arrest(0)=age prio / TIES=EFRON; STRATA fin; BASELINE OUT=a SURVIVAL=s logsurv = logs loglogs = loglogs ; RUN;
Testing PH (4) • Generates one S(t) curve for each stratum level. • Use these to plot ‘log(-log(S(t))’ for each group • Could you do same thing using covariate option as discussed earlier? • NO!! • Gives S(t) for level of the variable of interest • BUT based on a common baseline hazard • Means that the log(-log(S(t))) curves MUST be parallel
Testing PH (5) • No need to use the ‘covariates’ option, even for categorical variables • Issue is: • are the lines are parallel. • As long as covariates are the same for both groups, it ‘works’ • ODS Graphics can produce the S(t) and H(t) plots. • ODS graphics can not produce the log(-log(S(t))) plots directly • Instead, I use SAS Graph
proc sort data=a; by fin week; run; symbol1 interpol=j color=red width=6; symbol2 interpol=j color=green width=6; axis1 order=(0 to 1 by .1); axis2 logbase=10 logstyle=expand order=(1,10,100); proc gplot data=a ; plot (s)*week=fin/vaxis=axis1; plot (loglogs)*week=fin; run; proc gplot data=a ; plot (loglogs)*week=fin/ haxis=axis2; run; Gives the plot against log(t)
Testing PH (6) Graphical method #2 • Plot: Schoenfeld residuals • Do for each variable in the model • Fit a LOESS curve to each graph • Curve should be parallel to the x-axis • Departures imply PH assumption is violated • Can handle continuous predictors • Problem: Interpreting graphs is ‘tricky’ • Hard to distinguish random fluctuation from non-PH effects
Simulated data; 2 vars; dichotomous var; PH is OK Continuous var; PH is OK Dichot. var Cont. var
Simulated data; 2 vars; dichotomous var; PH is OK Continuous var; PH is NOT OK Dichot. var Cont. var
Simulated data; 2 vars; dichotomous var; PH is NOT OK Continuous var; PH is NOT OK Dichot. var Cont. var
Testing PH (7) Graphical method #3 • Uses the new SAS command: ASSESS • Easy to produce but tricky to understand • Based on Martingale residuals and a ‘counting process’ approach to survival models • Advanced ideas but, in overview: • We observe N(t): # events for subject at by time ‘t’ • Splits into two parts: • ‘process’ (based on the model hazard) • random (martingale)
Testing PH (8) Graphical method #3 • Plot Martingale residuals against time (ASSESS) • Generate 1,000 simulations of the ‘hazard’ process, which meets the PH assumption (RESAMPLE) • For each, compute the Martingale residuals • Plot the observed curve and simulated curves • Kolmogorov-type test supremum gives p-value that the observed curve is ‘consistent’ with PH.
ODS GRAPHICS ON; ODS rtf; PROC PHREG DATA=allison.recid; MODEL week*arrest(0)=fin age race wexp mar paro prio / TIES=EFRON; ASSESS PH / RESAMPLE; RUN; ODS rtf close; ODS GRAPHICS OFF;
Testing PH (7) Analytical method • If you have Non-PH, that means • HR varies over time • Should be detectable with a time varying covariate • What covariate to use? • Can develop specific models • For screening purposes use either: • x*t • x*log(t)
Testing PH (9) Analytical method (cont) • Can use either ‘t’ or ‘log(t)’ • Log(t) is usually preferred • ‘time’ can get very large • Can produce numerical problems • log(t) tends to avoid numerical problems. • PROCESS • Define time varying covariate • Defining variable in Proc step is easier • Place in model and run it • Look for statistical significance of the time varying variable
ODS GRAPHICS ON; ODS rtf; PROC PHREG DATA=allison.recid; MODEL week*arrest(0)=fin age race wexp mar paro prio aget / TIES=EFRON; aget = age*log(week); RUN; ODS rtf close; ODS GRAPHICS OFF;