250 likes | 416 Views
Evaluation of Support Vector Machines for Risk Modeling in Interventional Cardiology. Michael E. Matheny, M.D. Goal. Comparison of support vector machines and logistic regression risk modeling performance over time for the outcome of death in pre-intervention cardiac catheterization patients.
E N D
Evaluation of Support Vector Machines for Risk Modeling in Interventional Cardiology Michael E. Matheny, M.D.
Goal • Comparison of support vector machines and logistic regression risk modeling performance over time for the outcome of death in pre-intervention cardiac catheterization patients.
Pre-intervention Risk Assessment • Percutaneous Coronary Intervention (PCI) is a high volume procedure with significant morbidity & mortality • Risk of death in PCI varies widely based on co-morbidities • Providing accurate case level estimations can greatly aid patient and physician decision-making
Domain Data Quality • The American College of Cardiologists has published a standardized data dictionary (ACC-NCDR) and mandates that accredited centers maintain detailed data on all PCI patients • Some states, including Massachusetts, now have mandatory reporting of case data based on the ACC-NCDR
Current Risk Model StandardLogistical Regression (LR) • Gold standard for risk modeling in interventional cardiology • Type of generalized non-linear model • Used in analysis of a binary outcome • Bounded by 0 and 1 • Feature (variable) selection • From All Available Data • Known Risk Factors from Prior Studies • Selected Subset of data based on Study Design
Alternative Risk ModelSupport Vector Machine (SVM) • Key Features • Kernel Functions - introduce non-linearity in the hypothesis space without explicitly requiring a non-linear algorithm • Linear • Polynomial • Radial Based • Global Minimum
Risk Model EvaluationDiscrimination • Provides an estimate of population level accuracy • Area under the Receiver Operating Characteristic (ROC) Curve • Graphed by the sensitivity vs. 1-specificity at different thresholds
Risk Model EvaluationCalibration • Provides an estimation of case level accuracy • Hosmer-Lemeshow’s Goodness-of-Fit Test • Primarily used in logistic regression • Calculates how well the observed and expected frequencies match • Handles data sparsity better than more common methods (Variance, Pearson’s) • P > 0.05 is a good fit
Source Data • Brigham & Women’s Hospital • Interventional Cardiology Database • January 1, 2002 – October 30, 2004 • 5383 Cases • Data split two ways each into 2/3 Training (3588) and 1/3 Test (1795) • Sequential Split • sorted chronologically • October 27, 2003 split • Random Split
Logistic RegressionModel Development • STATA 8.2 (College Station, TX) • Backwards Stepwise Technique • Exclusion Threshold (P 0.05 – 0.15) • Feature Selection
Logistic RegressionFeature Selection • Model development • Sequential Training Set • Stepwise Backwards (P = 0.10) used for feature selection • Stepwise feature removal based on ROC and HL Goodness-of-fit (HL) optimization
Support Vector MachineModel Development • GIST 2.1.1 (Columbia University, NY, NY) • STATA 8.2 (College Station, TX) • All variables used • Kernel Choice • Polynomial (1-6) • Radial width factor (related to sigma) (0.1-20) • Probabilistic Output Methodology • Discriminant: distance from hyperplane • LR Model using Discriminant as the only feature • Established method to convert SVM classification to regression • Allows use of HL Goodness of fit
DiscussionAll Discrimination • All Models showed excellent performance • None of the models was significantly different in performance • This measure was relatively insensitive to changes in data across widely variable levels of calibration
DiscussionLR Calibration • For this data, LR was unable to maintain calibration. This is likely due to temporal data drift • The LR models required manual feature selection and expert knowledge to calibrate the training data sets
DiscussionSVM Calibration • Some versions of both kernel types were able to maintain calibration on both data sets • Calibration was maintained across larger parameter ranges of both kernels for the random data set than the sequential data set • Current assessments of discrimination and calibration on the training set are insufficient to choose the optimal kernel parameter
Conclusions • SVMs could be superior to LR in terms of maintaining calibration over time in this domain • Further exploration is needed to develop additional markers of model robustness • Further work in evaluating optimal time intervals to create new models or recalibrate old models