220 likes | 482 Views
On ranking in survival analysis: Bounds on the concordance index Vikas C. Raykar | Harald Steck | Balaji Krishnapuram CAD & Knowledge Solutions (IKM CKS), Siemens Medical Solutions USA, Inc., Malvern, USA Cary Dehing-Oberije | Philippe Lambin
E N D
On ranking in survival analysis: Bounds • on the concordance index • Vikas C. Raykar | Harald Steck |Balaji Krishnapuram • CAD & Knowledge Solutions (IKM CKS), Siemens Medical Solutions USA, Inc., Malvern, USA • Cary Dehing-Oberije | Philippe Lambin • Maastro clinic, University Hospital Maastricht, University Maastricht-GROW, The Netherlands • NIPS 2007
Organization • Motivation • Brief review of survival analysis • Concordance index • Our proposed ranking approach • Connections to survival analysis • Results
Motivation: Personalized medicine Predict survival time of lung cancer patients. Different kinds of treatment Chemo/radiotherapy dosage Survival time Different patient characteristics Age/gender/health Dataset available from MAASTRO hospital our collaborator.
Why not use regression? • Not amenable to standard statistical/ machine learning methods due to censored data. • Well studied in statistics as survival analysis.
Review: Survival Analysis Branch of statistics that deals with time until the occurrence of a event • When did a patient die ? • When did the disease manifest? • When did the machine fail? Widely used in medical statistics, epidemiology, reliability engineering, economics, sociology, marketing, insurance, etc.
Start of the study Data collected at this time Censored Data 2001 TIME What is censored data? At the end of the study a lot of patients may still survive. Patient unavailable for follow-up Some patients die during the study period. End of study Patient 1 Death 2005 The exact survival time may be longer than the observation period
Censored Data Censoring provides only partial information Typically a large portion of the data is censored. Observed Data Survival Time
Proportional Hazard (PH) Model • Has become a standard model for studying the effect of covariates on survival time distributions. unknown regression parameters relative hazard function Baseline hazard function covariate • Parameter estimates for PH model are obtained by maximizing • Cox’s partial likelihood.
Concordance Index or c-index • Standard performance measure for model assessment in survival analysis. • Generalization of the area under the ROC curve to regression problems/censored data. • Fraction of all pairs of subjects who's survival times can be ordered such that the subject with higher predicted survival is the one who actually survived longer.
5 Survival time 4 3 2 1 covariate Concordance Index-no censoring 5 4 3 2 C=1 perfect prediction accuracy C=0.5 as good as a random predictor 1
Concordance Index-with censoring 5 5 Survival time 4 4 3 3 2 1 No arrow can go above a censored point 2 1 Censored
Proposed approach: Maximize CI directly • While CI is widely used to evaluate a learnt model, it is not generally used as an objective function for training. • CI is invariant to monotone transformation of the survival times. • Hence the model learnt by maximizing the CI is a ranking function. (N-partite ranking problem)
Lower bounds on the CI Discrete optimization problem Use a differentiable concave lower bound Related to the PH model
Maximize lower bounds on the CI Linear ranking functions Regularization Use gradient based methods to maximize this
Connection to the PH model Log-likelihood for correct ranking For a proportional hazard model we can show that This is a common assumption made in ranking literature. We have shown that if we use PH models this is exactly the case.
Penalized log-likelihood Compare this with the objective function using the lower bound approach
Cox partial likelihood • Our proposed method explicitly maximizes a lower bound. • Cox method maximizes partial likelihood. • Experimental results indicate that both do well. • Conjecture: Is Cox’s partial likelihood also a lower bound on the CI?
Results Proposed method slightly better than Cox-PH. However differences not significant.