180 likes | 206 Views
Novartis Pharmaceuticals Advanced Exploratory Analytics. A novel statistical learning method for time-to-event outcomes w/ comparison to other ML approaches on Heart Failure data. Matth i as Kormaksson May 3 rd , 2019. Agenda. Objectives Heart Failure Studies Methods LASSO GAM
E N D
Novartis Pharmaceuticals Advanced Exploratory Analytics A novel statistical learning method for time-to-event outcomes w/ comparison to other ML approaches on Heart Failure data Matthias Kormaksson May 3rd, 2019
Agenda • Objectives • Heart Failure Studies • Methods • LASSO • GAM • Random Survival Forest • GAMLASSO • Experimental Comparison • Discrimination (C-index) • Calibration • Conclusions Business Use Only
Objectives • Simpson et al. (2016): • Develop multi-domain prognostic models for Heart Failure time-to-event endpoints. • Can benchmark model be improved by using state-of-the-art Machine Learning? Business Use Only
Heart Failure Studies • Data ( PARADIGM (training set), ATMOSPHERE (test set) • Response: Time to event (CV Death, HF-Hospitalization, Composite-endpoint, All Cause Death) • Categorical Predictors ( Sex, Ethnicity, PriorHeartFailureFLAG, ... • Continuous Predictors ( Age, Potassium, NT-proBNP, ... Business Use Only
Methods • LASSO: regularized regression that shrinks size of variable coefficients (some to zero), thus facilitating variable selection. • Generalized Additive Model (GAM): models non-linear relationships between risk and baseline predictors. • Random Survival Forest: an ensemble (survival) tree method for analysis of right-censored time-to-event data. • GAMLASSO: a novel statistical learning method that is a hybrid between GAM and LASSO, thus facilitating both non-linear modeling and variable selection. Business Use Only
Cox-Methods Continuous Categorical LASSO: All terms linear: -penaltyon all ’s GAM: -penalty on (or no penalty) -type-penalty on the non-linear ’s (or -penalty) GAMLASSO: -penalty on -type-penalty on the non-linear ’s Business Use Only
GAMLASSO Algorithm Continuous Categorical Set , then iterate (until convergence) between these steps: • LASSO step • GAM step L1-penalty on Fit Set L1-penalty on Fit Set Business Use Only
Random Survival Forest Data Bootstrap samples Survival Tree 1 Survival Tree 2 Survival Tree n CBH 1 CBH 2 CBH n Ensemble Cumulative Baseline Hazard Business Use Only
Discrimination (C-index) *Simpson et al. (2016) reported C=0.71 and C=0.70 respectively *Simpson et al. (2016) reported C=0.71 and C=0.70 respectively Business Use Only
Calibration Business Use Only
Calibration (Nam d’Agostino) * Simpson replicated model score = 14.22 * Simpson replicated model score = 9.81 * Simpson replicated model score = 13.10 Compare with Business Use Only
Conclusions • Carefully constructed benchmark model is robust and comparable to best Machine Learning (ML) models. • Advantage of ML models over carefully constructed benchmark model is automation. • Random survival forest suffered from poor calibration (an important performance metric often overlooked) • LASSO’s main advantage is variable selection, while GAM’s main advantage is non-linear modeling. GAMLASSO enjoys the best of both worlds and fared well in comparison to the other methods. Business Use Only
R-package Business Use Only
Acknowledgements Key team members • Guenther Mueller-Velten, DEV Biostatistics CM & EM • Core member of HF Study Group • Planning and coordination of statistical contributions • Hui Wang, (former) GCE remote contractor, DEV Biostatistics CM & EM • Planning and implementation of the multi-domain prognostic model • David James, DEV Stats Meth & Consulting* • Methodologic consultant • Assessment of various machine learning approaches for modeling of outcomes • IndrayudhGhosal, PhD Student Cornell University • Implemented the R-package during his summer internship at Novartis. Business Use Only