On ranking in survival analysis: Bounds on the concordance index Harald Steck
by user
Comments
Transcript
On ranking in survival analysis: Bounds on the concordance index Harald Steck
On ranking in survival analysis: Bounds on the concordance index Vikas C. Raykar | Harald Steck | Balaji Krishnapuram CAD & Knowledge Solutions (IKM CKS), Siemens Medical Solutions USA, Inc., Malvern, USA Cary Dehing-Oberije | Philippe Lambin Maastro clinic, University Hospital Maastricht, University Maastricht-GROW, The Netherlands NIPS 2007 1 Organization • • • • • • Motivation Brief review of survival analysis Concordance index Our proposed ranking approach Connections to survival analysis Results 2 Motivation: Personalized medicine Predict survival time of lung cancer patients. Different kinds of treatment Chemo/radiotherapy dosage Survival time Different patient characteristics Age/gender/health Dataset available from MAASTRO hospital our collaborator. 3 Why not use regression? • Not amenable to standard statistical/ machine learning methods due to censored data. • Well studied in statistics as survival analysis. 4 Review: Survival Analysis Branch of statistics that deals with time until the occurrence of a event When did a patient die ? When did the disease manifest? When did the machine fail? Widely used in medical statistics, epidemiology, reliability engineering, economics, sociology, marketing, insurance, etc. 5 What is censored data? Start of the study Patient unavailable for follow-up Some patients die during the study period. Patient 1 2001 At the end of the study a lot of patients may still survive. Data collected at this time End of study Death TIME Censored Data 2005 The exact survival time may be longer than the observation period 6 Censoring provides only partial information Typically a large portion of the data is censored. Survival Time Observed Data Censored Data 7 Notation: Survival analysis 8 Proportional Hazard (PH) Model • Has become a standard model for studying the effect of covariates on survival time distributions. unknown regression parameters relative hazard function Baseline hazard function covariate • Parameter estimates for PH model are obtained by maximizing Cox’s partial likelihood. 9 Concordance Index or c-index • Standard performance measure for model assessment in survival analysis. • Generalization of the area under the ROC curve to regression problems/censored data. • Fraction of all pairs of subjects who's survival times can be ordered such that the subject with higher predicted survival is the one who actually survived longer. 10 Concordance Index-no censoring 5 5 4 Survival time 4 3 2 3 1 covariate 2 1 C=1 perfect prediction accuracy C=0.5 as good as a random predictor 11 Concordance Index-with censoring 5 5 4 Survival time 4 3 3 No arrow can go above a censored point 1 2 2 1 Censored 12 Proposed approach: Maximize CI directly • While CI is widely used to evaluate a learnt model, it is not generally used as an objective function for training. • CI is invariant to monotone transformation of the survival times. • Hence the model learnt by maximizing the CI is a ranking function. (N-partite ranking problem) 13 Lower bounds on the CI Discrete optimization problem Use a differentiable concave lower bound Related to the PH model 14 Maximize lower bounds on the CI Linear ranking functions Regularization Use gradient based methods to maximize this 15 Connection to the PH model Log-likelihood for correct ranking For a proportional hazard model we can show that This is a common assumption made in ranking literature. We have shown that if we use PH models this is exactly the case. 16 Penalized log-likelihood Compare this with the objective function using the lower bound approach 17 Cox partial likelihood • Our proposed method explicitly maximizes a lower bound. • Cox method maximizes partial likelihood. • Experimental results indicate that both do well. • Conjecture: Is Cox’s partial likelihood also a lower bound on the CI? 18 Cox partial likelihood (cont.) 19 Results Proposed method slightly better than Cox-PH. However differences not significant. 20 Thank You ! | Questions ? 21