360 likes | 515 Views
Flexible Survival Modeling in SAS. Presented to Nova Scotia Sas Users Group meeting, Feb 22, 2013 By Ron Dewar – Dalhousie University and Cancer Care Nova Scotia. beyond LIFETEST and PHREG. Flexible survival modeling: today’s objectives. Introduce time-to-event analysis
E N D
Flexible Survival Modeling in SAS Presented to Nova Scotia Sas Users Group meeting, Feb 22, 2013 By Ron Dewar – Dalhousie University and Cancer Care Nova Scotia beyond LIFETEST and PHREG
Flexible survival modeling:today’s objectives • Introduce time-to-event analysis • Survival analysis in sas • Survival modeling - some recent developments • Availability of software (stata, R) • Progress in converting to sas macros: stata stpm2, predict, rcsgen • The road ahead
Time-to-Event • Duration from start to event for each subject • Status at time (experienced event or still at risk) • Non-informative censoring: censoring does not change risk of eventually experiencing event • Covariate patterns among subjects at risk (at the time of an event) • Other aspects: left truncation (delayed entry)attained age as time scalefixed or time-varying covariatesstrata informative censoring (competing risks)
Lifetest, Phreg • Lifetestplotted output step/smoothed (survival, hazard)life table estimatessignificance testing between strata • Phreg (cox regression)left truncationcovariates (numeric and categorical), strata,Wald tests, LR tests, AIC, BICplotted output(survival, hazard as step functions)regression diagnostics
Lifetest, Phreg (cont.) • Covariate time-dependency is common and (maybe) more interesting • Hazard Ratio (HR) is difficult to interpret at an individual level • Interest in other survival functions – hazard, hazard differences, survival differences, relative survival, cumulative hazard, crude probability (of event) • Parametric representation of survival functions and time-dependency relations (out of sample prediction)
Royston – Parmar survival model2002 Statistics in Medicine 21: 2175–2197 • Cumulative hazard scale – equiv. to hazard scale if no time-dependencies in covariates • Restricted cubic splines for cum. hazard • Implemented in Stata 11 as Stpm2 (model fitting) and Predict (post-estimation) • Similar (?) implementation in R • ‘cure’ models, left truncation, covariate time-dependency, strata, relative survival (excess hazard), net survival, other scales
Development plan, resources • Stata code and academic papers • Email support from module author • Replication of all features may not be feasible (limited programming resources, obscure stata features) • Basic functionality that replicates results • SAS code that can be used, understood, modified, enhanced by collaborators
Proportional Hazards Model Breast cancer survival with proportional hazards Odspdf file = “&loc.\doc\breast2004\cox regression.pdf”; procphregdata = _events_; class sstage (ref = first); model years*_death_(0) = sstage age1 age2 age3; title 'Breast cancer survival, 2004 – 2010, Nova Scotia’; title2 ‘followed to end of 2011’ title3 'Proportional Hazards model with stage, age'; run; Odspdf close;
PHREG output … Number of Observations Read 5475Number of Observations Used 5475 Percent Total Event Censored Censored 5475 830 4645 84.84…
PHREG output Parameter Standard Parameter DF Estimate Error Chi-Square Pr > ChiSq sstage II 1 0.77412 0.10221 57.3652 <.0001 sstage III 1 1.72777 0.10591 266.1493 <.0001 sstage IV 1 3.23040 0.10948 870.6004 <.0001 age1 1 0.18070 0.12381 2.1300 0.1444 age2 1 0.84798 0.11689 52.6287 <.0001 age3 1 1.58877 0.11833 180.2785 <.0001
Outline of analysis stepsR-P model • Create ‘standard’ dataset: dataset name, key variable names are fixed (%sas_stset) • Describe and fit model (%sas_stpm2) • Estimate functions of fitted model parameters (%predict) • plot predicted functions (eg, with Proc SGPlot)
Fit above model using stpm2() • 3 stage and age binary variables, hazard scale, 5 df for baseline • Stata command line Stpm2 st2 st3 st4 age1 age2 age3, scale(hazard) df(5) • Sas macro call %Sas_stmp2( st2 st3 st4 age1 age2 age3, scale=hazard, df = 5);
Stpm2: baseline knots • Log cumulative hazard is parameterised with restricted cubic spline functions: array z(*) rcs1 - rcsm < m spline variables> ; array k(*) < m knot values >; z(1) = y * log time to event; do j = 2 to m ; phi = (k(m) - k(j) )/(k(m) - k(1) ); z(j) = ( y > k(j) )*( y - k(j) )**3 - phi*( y> k(1) )*( y - k(1) )**3 - (1 - phi)*( y > k(m) )*( y - k(m) )**3; end;
How many knots to use?Where to put them? • Too few: unrealistic representation of hazard • Too many: over-parameterisation. Unrealistic lumps and bumps • AIC and BIC may be helpful. LR tests are not. models are not nested • Some subject matter knowledge can be helpful • Choice of position probably doesn’t matter too much • ‘standard’ positions : centile points of cumulative distribution of times of non-censored events
Programming in %sas_stpm2() • Describe model in macro call • Internal macro strings drive subsequent processing: • compute spline functions • 1st derivatives • orthogonalisation • Define linear predictor, log likelihood • derive initial values for optimisation • fit model (maximum likelihood with proc nlmixed) • save results for later processing
Key macro strings • _null_ data step to build macro strings • call symput(‘macro_var’, string) • Linear predictor: independent variables parameters to be estimated • Log likelihood: function to be maximised linear predictor 1st derivative of linear predictor censor indicator
Linear predictor, Likelihood %Sas_stmp2( st2 st3 st4 age1 age2 age3, scale=hazard, df = 5); Linear predictor: &xb. cons*_cons + st2*_st2 + st3*_st3 + st4*_st4 + age1*_age1 + age2*_age2 + age3*_age3 + rcs1*_rcs1 + rcs2*_rcs2 + rcs3*_rcs3 + rcs4*_rcs4 + rcs5*_rcs5 1st derivative: &dxb. rcs1*_drcs1 + rcs2*_drcs2 + rcs3*_drcs3 + rcs4*_drcs4 + rcs5*_drcs5 Log likelihood: _death_*((&dxb.) + &xb.) – exp(&xb.)
Linear predictor Covariates: xb = ifc (&int., 'cons*_cons + ', ' '); do _i_ = 1 to &n_cov.; var = scan("&covar.",_i_); xb= trim(xb)||''||trim(var)||'*_'||trim(var)||' + ' ; end; Splines: do _i_ = 1 to &df.; xb = trim(xb) ||' rcs'||put(_i_,1.)||'*' ||'_rcs'||put(_i_,1.)|| ifc(_i_< &df.," + ", " "); end;
Example: colon cancer * data must be sorted by unique ID; procsort data = example; by pid; run; * set up a standardised survival dataset; %sas_stset(example, censor(0), years , pid ) ; * fit model of interest; %sas_stpm2(st1 st2 st3 nsex, scale=hazard, df=3); * predicted hazard, survival functions for IIb cases; %predict(haz, hazard, at = st2:1 zero); %predict(surv, survival, at = st2:1 zero);
Example: time-varying covariate %sas_stpm2(st1 st2 st3 nsex, scale=hazard, df=3, tvc= st2 st3 , dftvc= st2:21 ); * hazard prediction; %predict(haz1, hazard, at = st2:1 zero); * Hazard ratio prediction; %predict(hr1, hratio, hrnum= st2:1 zero, hrdenom= st2:0 zero);
Example 2: time to initiation of chronic opiod use in new cancer patients • t0: date of new cancer diagnosis • t1: date of initiation of chronic opiod pain medication • Censor: death, end of study period (or 365 days) • Tier: cancer type grouped by 5-year survival probability • Age: 10-year age groups • Other covariates: year of diagnosis, urban/rural, sex%sas_stpm2(t2 t3 a1 a2 a3 a4 a5 a6 nsex nurb y, scale=hazard, df=3, tvc= a2 a6, dftvc = 2);
Example 2 %macro int(row =, col =, sel = ); … %predict(surv, survival, at = &sel. nsex:1 nurb:1 y:3 zero); … %mend; And then, each for of the 21 combinations of 3 tiers X 7 age groups: … %int( row = t2, col = ag2, sel = a1:1 t2:1); %int( row = t3, col = ag2, sel = a1:1 t3:1); …
The road ahead • Documentation (!no, really?) • Consistency checking • Confidence intervals for cumulative functions (survival, cumulative hazard) • Out of sample estimation is inefficient • Other survival scales (cumulative log odds, probit…) • Cure models • Stratified analysis • Competing risks framework
The future • Use of an optimsation routine that permits analytic 1st and 2nd derivatives (gradient & hessian) more efficient prediction out of sample prediction • Re-code string modification in %predict()