1 / 49

Lecture 20

Lecture 20. Comparing groups Cox PHM. Comparing two or more samples. Anova type approach where τ is the largest time for which all groups have at least one subject at risk Data can be right-censored for the tests we will discuss. Notation.

york
Download Presentation

Lecture 20

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 20 Comparing groups Cox PHM

  2. Comparing two or more samples • Anova type approach where τ is the largest time for which all groups have at least one subject at risk • Data can be right-censored for the tests we will discuss

  3. Notation • t1<t2<…tDbe distinct death times in all samples being compared • At time ti, let dij be the number of deaths in group j out of Yij individuals at risk. (j=1,..,K) • Define

  4. Log-Rank Test Rationale • Comparisons of the estimated hazard rate of the jth population under the null and alternative hypotheses • If the null is true, the pooled estimate of h(t) should be an estimator for hj(t)

  5. Applying the Test for j = 1,…,K If all Zj(τ)’s are close to zero, then little evidence to reject the null.

  6. Others? • LOTS! • Gehan test • Fleming-Harrington • Not all available in all software • Worth trying a few in each situation to compare inferences

  7. 2+ samples • Let’s look at a prostate cancer dataset • Prostate cancer clinical trial • 3 trt groups (doce Q3, doce weekly, Q3 mitoxantrone) • 5 PSA doubling times categories • outcome: overall survival

  8. TAX327 results

  9. R: survdiff ################################# # test for differences by trt grp plot(survfit(st~trt), mark.time=F, col=c(1,2,3)) test1 <- survdiff(st~trt) test2 <- survdiff(st~factor(trt, exclude=3)) test3 <- survdiff(st[trt<3]~trt[trt<3])

  10. PSADT categories

  11. R: survdiff table(psadt) plot(survfit(st~psadt), mark.time=F, col=1:5, lwd=rep(2,5)) legend(50,1,as.character(0:4), lty=rep(1,5), col=1:5, lwd=rep(2,5)) test1 <- survdiff(st~psadt) test2 <- survdiff(st[psadt<3 & psadt>0]~psadt[psadt<3 & psadt>0]) test3 <- survdiff(st[psadt>2]~psadt[psadt>2])

  12. Caveat • Note that we are interested in the average difference (consider log-rank specifically) • What if hazards ‘cross’? • Could have significant difference prior to some t, and another significant difference after t: but, what if direction differs?

  13. What about all those differences in our prostate cancer KM curves? • Not much evidence of crossing • if there isnt overlap, then tests will be somewhat consistent • log-rank: most appropriate for ‘proportional hazards’

  14. Example • Klein & Moeschberger 1.4 • Kidney infection data • Two groups: • patients with percutaneous placement of catheters (N=76) • patients with surgical placement of catheters (N=43)

  15. Kaplan-Meier curves

  16. Log-rank

  17. Comparisons p 0.11 0.96 0.53 0.24 0.26 0.002 0.24 0.002 0.002 0.004

  18. Why such large differences?

  19. Notice the differences! • Situation of varying inferences • Need to be sure that you are testing what you think you are testing • Check: • look at hazards? • do they cross? • Problem: • estimating hazards is messy and imprecise • recall: h(t)= derivative H(t)

  20. Misconception • Survival curves crossing  telling about appropriateness of log-rank • Not true: • survivals crossing depends on censoring and study length • what if they will cross but t range isnt sufficient? • Consider: • Survival curves cross  hazards cross • Hazards cross  survivals may or may not cross • solution? • test in regions of t • prior to and after cross based on looking at hazards • some tests allow for crossing (Yang and Prentice 2005)

  21. Cox Propotional Hazards Model • Names • Cox regression • semi-parametric proportional hazards • Proportional hazards model • Multiplicative hazards model • When? • 1972 • Why? • allows adjustment for covariates (continuous or categorical) in a survival setting • allows prediction of survival based on a set of covariates • Analogous to linear and logistic regression in many ways

  22. Cox PHM Notation • Data on n individuals: • Tj : time on study for individual j • dj : event indicator for individual j • Zj : vector of covariates for individual j • More complicated: Zj(t) • covariates can be time dependent • they may change with time/age

  23. Basic Model For a Cox model with just one covariate:

  24. Comments on basic model • h0(t): • arbitrary baseline hazard rate. • notice that it varies by t • β: • regression coefficient (vector) • interpretation is a log hazard ratio • Semi-parametric form • non-parametric baseline hazard • parametric form assumed only for covariate effects

  25. Linear model formulation • Usual formulation • Coding of covariates similar to linear and logistic (and other generalized linear models)

  26. Why “proportional”? • hazard ratio • Does not depend on t (i.e., it is a constant over time) • But, it is proportional (constant multiplicative factor) • Also referred to (sometimes) as the relative risk.

  27. Simple example • one covariate: z = 1 for new treatment, z=0 for standard treatment • hazard ratio = exp(β) • interpretation: exp(β) is the risk of having the event in the new treatment group versus the standard treatment at any point in time. • Interpretation: at any point in time, the risk of the event in the new treatment group is exp(β) times the risk in the standard treatment group

  28. Hazard Ratios • Assumption: “Proportional hazards” • The risk does not depend on time. • That is, “risk is constant over time” • But that is still vague….. • Hypothetical Example: Assume hazard ratio is 0.5. • Patients in new therapy group are at half the risk of death as those in standard treatment, at any given point in time. • Hazard function= P(die at time t | survived to time t)

  29. Hazard Ratios • Hazard Ratio = hazard function for New hazard function for Std • Makes assumption that this ratio is constant over time.

  30. Interpretation Again • For any fixed point in time, individuals in the new treatment group are at half the risk of death as the standard treatment group.

  31. Hazard ratio is not always valid …. Hazard Ratio = .71

  32. Refresher of coding covariates • This should be nothing new • Two kinds of ‘independent’ variables • quantitative • qualitative • Quantitative are continuous • need to determine scale • units • transformation? • Qualitative are generally categorical • ordered • nominal • coding affects the interpretation

  33. Tests of the model • Testing that βk=0 for all k=1,..,p • Three main tests • Chi-square/Wald test • Likelihood ratio test • score(s) test • All three have chi-square distribution with p degrees of freedom

  34. Example: TAX327 • Randomized clinical trial of men with hormone-refractory prostate cancer • three treatment arms (Q3 docetaxel, weekly docetaxel, and Q3 mitixantrone) • other covariates of interest: • psa doubling time • lymph node involvement • liver metastases • number of metastatic sites • pain at baseline • baseline psa • tumor grade • alkaline phosphatase • hemoglobin • performance status

  35. Some of the covariates

  36. Cox PHM approach st <- Surv(survtime, died) attach(data, pos=2) reg1 <- coxph(st ~ trtgrp) reg2 <- coxph(st ~ factor(trtgrp)) summary(reg2) attributes(reg2) reg2$coefficients summary(reg2)$coef

  37. Results > summary(reg2) Call: coxph(formula = st ~ factor(trtgrp)) n= 1006 coef exp(coef) se(coef) z p factor(trtgrp)2 0.105 1.11 0.0882 1.19 0.2300 factor(trtgrp)3 0.245 1.28 0.0863 2.84 0.0045 exp(coef) exp(-coef) lower .95 upper .95 factor(trtgrp)2 1.11 0.900 0.935 1.32 factor(trtgrp)3 1.28 0.783 1.079 1.51 Rsquare= 0.008 (max possible= 1 ) Likelihood ratio test= 8.12 on 2 df, p=0.0173 Wald test = 8.16 on 2 df, p=0.0169 Score (logrank) test = 8.19 on 2 df, p=0.0167

  38. Multiple regression • In the published paper, the model included all covariates included in previous list

  39. Fitting it in R reg3 <- coxph(st ~ factor(trtgrp) + liverny + numbersites + pain0c + pskar2c + proml + probs + highgrade + logpsa0 + logalkp0c + hemecenter + psadtmonthcat) reg4 <- coxph(st ~ factor(trtgrp) + liverny + numbersites + pain0c + pskar2c + proml + probs + highgrade + logpsa0 + logalkp0c + hemecenter + factor(psadtmonthcat))

  40. > reg3 Call: coxph(formula = st ~ factor(trtgrp) + liverny + numbersites + pain0c + pskar2c + proml + probs + highgrade + logpsa0 + logalkp0c + hemecenter + psadtmonthcat) coef exp(coef) se(coef) z p factor(trtgrp)2 0.1230 1.131 0.1099 1.12 2.6e-01 factor(trtgrp)3 0.3784 1.460 0.1070 3.54 4.0e-04 liverny 0.4813 1.618 0.2168 2.22 2.6e-02 numbersites 0.4757 1.609 0.1430 3.33 8.8e-04 pain0c 0.3708 1.449 0.0925 4.01 6.1e-05 pskar2c 0.3167 1.373 0.1339 2.37 1.8e-02 proml 0.3132 1.368 0.1125 2.78 5.4e-03 probs 0.2568 1.293 0.0991 2.59 9.5e-03 highgrade 0.1703 1.186 0.0922 1.85 6.5e-02 logpsa0 0.1549 1.168 0.0312 4.96 7.0e-07 logalkp0c 0.2396 1.271 0.0483 4.96 7.0e-07 hemecenter -0.1041 0.901 0.0351 -2.96 3.1e-03 psadtmonthcat -0.0884 0.915 0.0430 -2.05 4.0e-02 Likelihood ratio test=205 on 13 df, p=0 n=641 (365 observations deleted due to missingness) >

  41. “Local” Tests • Testing individual coefficients • But, more interestingly, testing sets of coefficients • Example: • testing the psa variables • testing treatment group (3 categories) • Same as previous: • Wald test • Likelihood ratio • Scores test

  42. TAX327 reg5 <- coxph(st ~ liverny + numbersites + pain0c + pskar2c + proml + probs + highgrade + logpsa0 + logalkp0c + hemecenter + factor(psadtmonthcat)) lrt.trt <- 2*(reg4$loglik[2] - reg5$loglik[2]) p.trt <- 1-pchisq(lrt.trt, 2) #` to compare, you need to have the same dataset liverny1 <- ifelse(is.na(psadtmonthcat),NA,liverny) reg6 <- coxph(st ~ factor(trtgrp) + liverny1 + numbersites + pain0c + pskar2c + proml + probs + highgrade + logpsa0 + logalkp0c + hemecenter) lrt.psadt <- 2*(reg4$loglik[2] - reg6$loglik[2]) p.psadt <- 1-pchisq(lrt.psadt, 4)

  43. proportional? • recall we are making strong assumption that we have proportional hazards for each covariate • we can investigate this to some extent via graphical displays • formal test of the assumption is also possible. • can be used for quantitative or qualitative variables • depends strongly on N • simply gives p-value

  44. Approach for testing proportionality • there are residuals in Cox models • “Schoenfield residuals” have similar interpretation to residuals from linear regression • recall in linear regression: no pattern between residuals and covariates • “rho” is estimated to be the correlation between transformed survival time and the scaled Schoenfeld residuals • Test that the correlation is zero vs. non-zero • A “good” model has correlation of zero

  45. Testing proportionality in R > reg6.z <- cox.zph(reg6) > > reg6.z rho chisq p factor(trtgrp)2 0.00417 0.00917 0.9237 factor(trtgrp)3 0.00456 0.01127 0.9154 liverny1 -0.06709 2.44565 0.1179 numbersites 0.01869 0.17896 0.6723 pain0c -0.08400 3.69796 0.0545 pskar2c -0.02822 0.41993 0.5170 proml -0.05994 2.00080 0.1572 probs -0.04290 1.00064 0.3172 highgrade 0.01497 0.11942 0.7297 logpsa0 -0.02583 0.29315 0.5882 logalkp0c 0.00210 0.00210 0.9635 hemecenter 0.06715 2.54094 0.1109 GLOBAL NA 16.18273 0.1830

  46. Stata Commands insheet using "C:\....\nomogramdata.txt" stset survtime, fa(died) sts graph, cens(m) sts graph, by(trt) cens(m) sts test trt xi: stcox i.trt * diagnostics stphplot, by(trt) stcoxkm, by(trt) stcoxkm, by(trt) separate xi: stcox i.trt, schoenfeld(trtsch*) estat phtest tab pain0c stphplot if pain0c<2, by(pain0c) stcoxkm if pain0c<2, by(pain0c) separate stcox pain0c if pain0c<2, schoenfeld(painsch*) estat phtest

  47. Stata: Cox Regression . xi: stcox i.trt i.trtgrp _Itrtgrp_1-3 (naturally coded; _Itrtgrp_1 omitted) failure _d: died analysis time _t: survtime Iteration 3: log likelihood = -4881.615 Refining estimates: Iteration 0: log likelihood = -4881.615 Cox regression -- Breslow method for ties No. of subjects = 1006 Number of obs = 1006 No. of failures = 800 Time at risk = 18886.96674 LR chi2(2) = 8.09 Log likelihood = -4881.615 Prob > chi2 = 0.0175 ------------------------------------------------------------------------------ _t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Itrtgrp_2 | 1.111033 .0980152 1.19 0.233 .9346177 1.320748 _Itrtgrp_3 | 1.277127 .1101523 2.84 0.005 1.078495 1.512343 ------------------------------------------------------------------------------

  48. * diagnostics stphplot, by(trt) stcoxkm, by(trt) stcoxkm, by(trt) separate

  49. Test of Proportionality xi: stcox i.trt, schoenfeld(trtsch*) estat phtest Test of proportional-hazards assumption Time: Time ---------------------------------------------------------------- | chi2 df Prob>chi2 ------------+--------------------------------------------------- global test | 0.30 2 0.8616 ----------------------------------------------------------------

More Related