370 likes | 704 Views
Proportional Hazard Regression. Cox Proportional Hazards Modeling (PROC PHREG). Consider the following data:. Drug addicts are enrolled in two different residential treatment programs that differ in length (treat = 0 is short, treat = 1 is long).
E N D
Proportional Hazard Regression Cox Proportional Hazards Modeling (PROC PHREG)
Consider the following data: Drug addicts are enrolled in two different residential treatment programs that differ in length (treat = 0 is short, treat = 1 is long). The patients are assigned to two different sites (site = 0 is site A, site = 1 is site B). Herco indicates heroine and cocaine use in the past three months (1= heroine and cocaine use, 2 = heroine or cocaine use, 3 = neither heroine or cocaine use). Other variables recorded were age at time of enrollment, ndrugtx (number of previous drug treatments), time until return to drug use, and censor (1=return to drug use, 0 = censored).
Reading a SAS Data Set into SAS You will need to save the data set uis_small to your computer. It is a SAS data set, and it can be read into a SAS program using the following code (making the appropriate adjustment to the file location):
To make sure the data set was read in properly, print out the first 10 observations: The SAS System Obs ID age ndrugtx treat site time censor herco 1 1 39 1 1 0 188 1 3 2 2 33 8 1 0 26 1 3 3 3 33 3 1 0 207 1 2 4 4 32 1 0 0 144 1 3 5 5 24 5 1 0 551 0 2 6 6 30 1 1 0 32 1 1 7 7 39 34 1 0 459 1 3 8 8 27 2 1 0 22 1 3 9 9 40 3 1 0 210 1 2 10 10 36 7 1 0 184 1 2
First compare survival rates for the three categorical variables of treat, site and herco:
The Wilcoxon and Log-Rank Tests (output not shown) are statistically significant (p = 0.0021, p = 0.0091, respectively). Treatment affects risk of returning to drug use.
The Wilcoxon and Log-Rank Tests (output not shown) are not statistically significant (p = 0.0779, p = 0.1240, respectively). Site does not affect risk of returning to drug use.
The Wilcoxon and Log-Rank Tests (output not shown) are not statistically significant (p = 0.2919, p = 0.1473, respectively). Herco does not affect risk of returning to drug use, although the curves do cross initially, so this may affect these statistical tests.
Now examine if ndrugtx and age affect the risk of returning to drug use. Because these are continuous variables, we will use proportional hazard regression (PROC PHREG):
Interpreting the Output • The proportional hazards regression model for these data with ndrugtx as the predictor is: λ(t)= λo(t)exp(0.02937*ndrugtx) • The relative risk of a 1 unit increase in the number of previous drug treatments is: = λo(t)exp(0.02937*1)/ λo(t)exp(0.02937*0) = exp(0.02937-0) = exp(0.02937) = 1.03 • With each increase in the number of prior drug treatments, the risk of relapsing increases by 3% (1.03-1.00). • Notice that the SAS output also gives you this relative risk under “Hazard Ratio.” • This term is significant (p<0.0001), which indicates that prior drug treatments affect risk of relapse.
Interpreting the Output: Age • The proportional hazards regression model for these data with age as the predictor is: λ(t)= λo(t)exp(-0.01286*age) • The relative risk of a 1 year increase in age at enrollment is: = λo(t)exp(-0.01286*1)/ λo(t)exp(-0.01286*0) = exp(-0.01286-0) = exp(-0.01286) = 0.987 • With each year increase in age of enrollment, the risk of relapsing decreases by 1.3% (1.00-0.987). • Notice that the SAS output also gives you this relative risk under “Hazard Ratio.” • Age is not significantly related to risk, however (p=0.735).
The Full Model First consider the full model with all of the predictor variables. As part of the PHREG procedure, we will create 2 new variables: herco2 and herco3. In addition, we will conduct a test labeled “herco” to determine whether both of these variables together are significant.
The test of our two new variables, herco2 and herco3, is non-significant (p = 0.1130), so we will drop herco from our model and run the refitted model.
All of the terms in the model are significant, except for site, which is approaching significance. Because we know from previous research that site is important, we will leave it in our model. We will now check six different interactions in our model, to see if any significant ones exist: ndrugtx*age, ndrugtx*treat, ndrugtx*site, age*treat, age*site, treat*site
Adding ndrugtx*age to the model (notice you can create the interaction term within the PHREG procedure):
Final Model Selection Not only was the age*site interaction significant, but once we included it in our model, the site term also became statistically significant. The final proportional hazard model is: λ(t)= λo(t)exp(β1*age + β2*ndrugtx + β3*treat + β4*site + β5*treatsite) λ(t)= λo(t)exp(-0.034*age + 0.036*ndrugtx – 0.267*treat – 1.246*site + 0.034*treatsite)
Testing Proportionality The Cox proportional hazard regression we have just conducted assumes that the risks are proportional, that is, that the proportion is constant over time. To test this assumption of proportionality, we use time-dependent variables and test whether they are significant. If they are not significant, it means that time does not affect the relative risk, and we can conclude that the risks in our model are proportional.
Creating and testing time-dependent varibles (on the log scale):
The test we labeled “test_proportionality” is not significant (p = 0.7309), which means that none of our time-dependent variables are significant. We can assume proportionality over time.
If we cannot assume proportionality… If the assumption of proportionality was not met, we could stratify across the variable that does not have a proportionate risk. For example, if we found the variable treat to be not proportional, we could stratify on that variable: