410 likes | 526 Views
HSRP 734: Advanced Statistical Methods July 17, 2008. Objectives. Describe and use the Cox proportional hazards model to describe and compare survival experiences Use SAS to implement. From Stratification to Modeling. What have we done so far?
E N D
Objectives • Describe and use the Cox proportional hazards model to describe and compare survival experiences • Use SAS to implement
From Stratification to Modeling • What have we done so far? • Estimated the survival function with the minimum of assumptions • Compared the survival function of various groups using nonparametric tests • Similar to a contingency table analysis, the above tests are somewhat limited to simple stratifications
From Stratification to Modeling • Goal: extend survival analysis to an approach that allows for multiple covariates of mixed forms (i.e., continuous, ordinal and nominal categorical) • We have two options for our expansion • Model the survival function or time • Model the hazard function (between 0 to ∞)
Cox Proportional Hazards Model • We will model the hazard function • In the Cox proportional hazards model, we have a regression-based approach to survival analysis.
What are Proportional Hazards • The constant C does not depend on time
Cox Proportional Hazards Model • Cox assumed this proportionality constant and proposed the following model. where h0(t) is the baseline hazard; involves t but not X, is the exponential function; involves X’s but not t ( as long as the X’s are time independent).
Cox Proportional Hazards Model • Hazard rate = baseline hazard rate x positive term that depends on a “score” • Score = linear function of explanatory factors • Note: Baseline hazard rate is the same for everyone • “Score” may be negative
Cox Proportional Hazards Model • The Cox proportional hazards (PH) model assumes one of many possible forms. • We could use any function g(X) > 0. such that
Cox Proportional Hazards Model • In the Cox PH model, we do not include an intercept term. This is because any intercept term could be incorporated into the baseline hazard.
Cox Proportional Hazards Model • The regression model for the hazard function (instantaneous incidence rate) as a function of p explanatory (X) variables is specified as follows: log hazard: log h(t; X) = log h0(t) + b1X1 + b2X2 + … + bpXp hazard:
Cox Proportional Hazards Model • Interpretation of h0(t): Baseline hazard (incidence) rate as a function of time • Baseline can be interpreted as when all X’s are zero – often must center continuous variables to make h0(t) interpretable
Cox Proportional Hazards Model • Interpretation of • is the relative hazard associated with a 1 unit change in X1 (i.e., X1+1 vs. X1), holding other Xs constant, independent of time or, in relative risk terms, • is the relative risk for X1+1 vs. X1, holding other Xs constant, independent of time • Other bs have similar interpretations
Cox Proportional Hazards Model • Note: “multiplies” the baseline hazard h0(t) by the same amount regardless of the time t. This is therefore a “proportional hazards” model – the effect of any (fixed) X is the same at any time during follow-up
Cox Proportional Hazards Model • Applying the formula relating S(t) to the cumulative hazard to the proportional hazards model,
Cox Proportional Hazards Model • b is the focus whereas h0(t) is a nuisance variable • David Cox (1972) showed how to estimate bwithout having to assume a model for h0(t) • “Semi-parametric” • h0(t) is the baseline hazard - “non-parametric” part of the model • b1, b2, …, bp are the regression coefficients - “parametric” part of the model • Think of estimating h0(t) with a step function • Let # steps get large — “partial likelihood” for bdepends on b, not h0(t)
Partial likelihood • The likelihood function used in Cox PH models is called a partial likelihood • We use only the part of the likelihood function that contains the b’s • It depends only on the ranks of the data and not the actual time values.
Partial likelihood • Let the survival times (times to failure) be: t1 < t2 < ... < tk • And let the “risk sets” corresponding to these times be: R1, R2, ..., Rk Rj= list of persons at risk just before tj • Then, the “partial likelihood” for bis (Assumes no ties in event times) • To estimate b, find the values of bs that maximize L(b) above.
Partial likelihood • Why does the partial likelihood make sense? • Choose bso that the one who failed at each time was most likely - relative to others who might have failed!
Some General Comments Thoughts • Similar to logistic regression, a simple function of the has a particularly nice interpretation • can be interpreted as a relative risk (risk ratio) for a one unit change in the predictor
Some General Comments Thoughts • Using the common methods of estimation, it can be shown that estimated regression parameters have an asymptotically normal distribution with mean and finite variance
Some General Comments Thoughts • Two important implications of asymptotic normality • We can use the likelihood ratio, score, and Wald tests to make inference about our data • We can use the usual method to construct a 95% confidence interval
Confidence Intervals • Instead of comparing a 49 year old to a 50 year old (a one unit difference in age), what if we want the hazard ratio and confidence interval comparing a 49 year old to a 59 year old?
Some General Comments Thoughts • The Cox PH model is a regression model and we can use the usual tools for model building (e.g., stepwise methods or linearity of predictor via higher order terms)
Two Examples • AML — one covariate • UIS — more than one covariate
Example 1: Cox PH model for AML data • Semi-parametric model for the hazard (incidence) rate for the AML data where hi(t) is the hazard for person i at week t, h0(t) is the hazard if Xi = 0 (not maintained group), and is the multiplicative effect of Xi=1 (maintained group)
Example 1: Cox PH model for AML data (cont’d) • = 0.444 – relative rate of AML relapse maintained vs. not maintained 95% CI : (0.16, 1.23) • 1/0.444 = 2.25 – relative rate of AML relapse not maintained vs. maintained 95% CI : (1/1.23, 1/0.16) = (0.81, 6.26)
Example 2: Cox PH model for UIS data • Description of the variables from the UIS study in Table 1.3 of Hosmer, D.W. and Lemeshow, S. (1998) Applied Survival Analysis: Regression Modeling of Time to Event Data, John Wiley and Sons Inc., New York, NY • This data set is available at http://www-unix.oit.umass.edu/~statdataselect “datasets” and then “survival analysis”
Example 2: Cox PH model for UIS data (cont’d) • We use Cox PH model to compare two treatment randomization assignments, controlling for several covariates • Compare long treatment randomization assignment with short treatment randomization assignment • Use time to drug relapse as the response variable • Time variable is time from admission date to drug relapse or censoring due to the end of the study or lost to follow-up (the definition for variable CENSOR is questionable in the data set; however, we still use it as a demonstration.) • Control for other risk factors in making the comparison
The Description of UIS data Data are in the file uissurv.datn = 628Variable Description Codes/Values ID Identification Code 1 - 628AGE Age at Enrollment YearsBECKTOTA Beck Depression Score 0.000 - 54.000HERCOC Heroin/Cocaine Use During 1 = Heroin & Cocaine 3 Months Prior to Admission 2 = Heroin Only 3 = Cocaine Only 4 = Neither Heroin nor CocaineIVHX History of IV Drug Use 1 = Never 2 = Previous 3 = Recent
The Description of UIS data (cont’d) Variable Description Codes/Values NDRUGTX Number of Prior Drug Treatments 0 - 40RACE Subject's Race 0 = White 1 = Non-WhiteTREAT Treatment Randomization 0 = Short Assignment 1 = LongSITE Treatment Site 0 = A 1 = BLOS Length of Stay in Treatment Days (Admission Date to Exit Date)TIME Time to Drug Relapse Days (Measured from Admission Date)CENSOR Event for Treating Lost to 1 = Returned to Drugs or Follow-Up as Returned to Drugs Lost to Follow-Up 0 = Otherwise
Example 2: Cox PH model for UIS data (cont’d) • Model 1: log h(t) = log h0(t) + b1TREAT • Model 2: log h(t) = log h0(t) + b1TREAT + b2AGE + b3RACE + b4BECKTOTA + b5HERCOC.1 + b6HERCOC.2 + b7HERCOC.3 where HERCOC.1 = 1 if HERCOC = 1; = 0 otherwise, HERCOC.2 = 1 if HERCOC = 2; = 0 otherwise, HERCOC.3 = 1 if HERCOC = 3; = 0 otherwise,
Example 2: Cox PH model for UIS data (cont’d) • What is the relative risk of drug relapse for the long treatment group compared to the short treatment group, adjusting for age and other risk factors? • e-0.2273 = 0.797 – about 20% reduction in the risk of drug relapse for the patients in the long treatment randomization assignment compared with patients in the short treatment randomization assignment.
Example 2: Cox PH model for UIS data (cont’d) • What is the interpretation of each coefficient? • AGE — controlling for treatment assignment and other risk factors, the risk of drug relapse, as estimated from a Cox model, is 0.98 times lower per year of age • RACE — controlling for treatment assignment and other risk factors, the risk of drug relapse is 0.78 times lower for non-white compared with white • BACKTOTA — controlling for treatment assignment and other risk factors, the risk of drug relapse is 1.01 times higher per unit difference in Beck Depression score
Example 2: Cox PH model for UIS data (cont’d) • HERCOC.1 — controlling for treatment assignment and other risk factors, the risk of drug relapse is 1.217 times higher for patients who use Heroin and Cocaine compared with those who use neither Heroin nor Cocaine; however, this risk is not statistically different from 1 • HERCOC.2 — you do! • HERCOC.3 — you do!
Example 2: Cox PH model for UIS data (cont’d) • You must think about another way to deal with variable HERCOC since none of the dummy variables is significant. • How to do it? • I randomly chose the covariates for the demonstration. To find a best model seriously, you need to go through the model selection.
Example 2: Cox PH model for UIS data (cont’d) • What is the relative risk of drug relapse for (A) A short treatment randomization assigned 45-year old vs. (B) A long treatment randomization assigned 75 -year old
Example 2: Cox PH model for UIS data (cont’d) • Log hazard for (A) = const + 0 x (-0.2273) + 45 x (-0.0185) = const – 0.8325 • Log hazard for (B) = const + 1 x (-0.2273) + 75 x (-0.0185) = const – 1.6148 • Difference in log hazards, (A) vs. (B): (const – 0.8325) – (const – 1.6148) = 0.7823 • Relative Risk (A) vs. (B) e0.7823 = 2.19 – higher risk for younger, short treatment randomization assigned patient than for older, long treatment randomization assigned patient.
Example 2: Cox PH model for UIS data (cont’d) • How much higher is the risk of a 70 years old patient compared with a 60 years old patient, assuming treatment and other risk factors are the same? • The estimated difference in log hazards for two patients whose ages differ by 10 years, holding other covariates fixed is 10 x =10 x (-0.0185) = -0.185 RR = e-0.185 = 0.83 – a ten year difference in the age decreases the risk of drug relapse by 20% • How would you determine age modifies the risk of drug relapse for long treatment assignment vs. short treatment assignment?