1 / 15

Statistics 262: Intermediate Biostatistics

Statistics 262: Intermediate Biostatistics. May 18, 2004: Cox Regression III: residuals and diagnostics, repeated events. Jonathan Taylor and Kristin Cobb. Residuals. Residuals are used to investigate the lack of fit of a model to a given subject.

Download Presentation

Statistics 262: Intermediate Biostatistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistics 262: Intermediate Biostatistics May 18, 2004: Cox Regression III: residuals and diagnostics, repeated events Jonathan Taylor and Kristin Cobb Satistics 262

  2. Residuals • Residuals are used to investigate the lack of fit of a model to a given subject. • For Cox regression, there’s no easy analog to the usual “observed minus predicted” residual of linear regression Satistics 262

  3. Deviance Residuals • Deviance residuals are based on martingale residuals: ci (1 if event, 0 if censored) minus the estimated cumulative hazard to ti (as a function of fitted model) for individual i: ci-H(ti,Xi,ßi) See Hosmer and Lemeshow for more discussion… Satistics 262

  4. Deviance Residuals • Behave like residuals from ordinary linear regression • Should be symmetrically distributed around 0 and have standard deviation of 1.0. • Negative for observations with longer than expected observed survival times. • Plot deviance residuals against covariates to look for unusual patterns. Satistics 262

  5. Deviance Residuals • In SAS, option on the output statement: Ouput out=outdata resdev= Satistics 262

  6. Schoenfeld residuals • Schoenfeld (1982) proposed the first set of residuals for use with Cox regression packages • Schoenfeld D. Residuals for the proportional hazards regresssion model. Biometrika, 1982, 69(1):239-241. • Instead of a single residual for each individual, there is a separate residual for each individual for each covariate • Based on the individual contributions to the derivative of the log partial likelihood (see chapter 6 in Hosmer and Lemeshow for more math details, p.198-199) • Note: Schoenfeld residuals are not defined for censored individuals. Satistics 262

  7. Schoenfeld residuals Where K is the covariate of interest, the Schoenfeld residual is the covariate-value, Xik, for the person (i) who actually died at time ti minus the expected value of the covariate for the risk set at ti (=a weighted-average of the covariate, weighted by each individual’s likelihood of dying at ti). Plot Schoenfeld residuals against time to evaluate PH assumption Satistics 262

  8. Schoenfeld residuals In SAS: option on the output statement: ressch= Satistics 262

  9. Influence diagnostics • How would the result change if a particular observation is removed from the analysis? Satistics 262

  10. Influence statistics • Likelihood displacement (ld): measures influence of removing one individual on the model as a whole. What’s the change in the likelihood when this individual is omitted? • DFBETA-how much each coefficient will change by removal of a single observation • negative DFBETA indicates coefficient increases when the observation is removed Satistics 262

  11. Influence statistics In SAS: option on the output statement: ld= dfbeta= Satistics 262

  12. What about repeated events? • Death (presumably) can only happen once, but many outcomes could happen twice… • Fractures • Heart attacks • Pregnancy Etc… Satistics 262

  13. Repeated events: 1 • Strategy 1: run a second Cox regression (among those who had a first event) starting with first event time as the origin • Repeat for third, fourth, fifth, events, etc. • Problems: increasingly smaller and smaller sample sizes. Satistics 262

  14. Repeated events:Strategy 2 • Treat each interval as a distinct observation, such that someone who had 3 events, for example, gives 3 observations to the dataset • Major problem: dependence between the same individual Satistics 262

  15. Strategy 3 • Stratify by individual (“fixed effects partial likelihood”) • In PROC PHREG: strata id; • Problems: • does not work well with RCT data, however • requires that most individuals have at least 2 events • Can only estimate coefficients for those covariates that vary across successive spells for each individual; this excludes constant personal characteristics such as age, education, gender, ethnicity, genotype Satistics 262

More Related