350 likes | 366 Views
This article discusses the limitations and issues with common regression modeling approaches in statistical predictions and their impact on decision-making. It provides insights on validation, calibration, discrimination, and clinical usefulness of prediction models. The article also explores strategies to improve performance and prevent poor decision-making.
E N D
Why Most Statistical Predictions Cannot Reliably Support Decision-Making: Problems Caused By Common RegressionModeling Approaches Ewout Steyerberg Professor of Clinical Biostatistics and Medical Decision Making Leiden, October 2017
Why most predictionmodels are false • Methods at development • Rigorous validation
Three competitive models • MMRPredict(NEJM, 2006) - Common regressionmodeling approach; small data set • MMRPro(JAMA 2006)- Bayesian modeling approach; moderate size data set • PREMM (JAMA 2006; Gastroenterology 2011; JCO 2016)- Sensible regression modeling approach; large data set Which model wins? Which may do harm?
Discrimination Clinic-based Population-based
Calibration plots: obs vs predicted • Calibration slope as a measure of overfitting
Clinical usefulness • Statistical performance: Discrimination and calibration • Consider full range of predictions • Decision-analytic performance: • Define a decision threshold: act if risk > threshold • TP and FP classifications • Net Benefit as a summary measure:NB = (TP – w FP) / n, with w = harm/benefit(Vickers & Elkin, MDM 2006)
Decision curve analysis Clinic-based Population-based
Overview • Clinical context: testing for Lynch syndrome • Statistical and decision-analytic performance • Could poor performance have been foreseen? Prevented?
Sample size issues Robust: strong, vigorous, sturdy, tough, powerful, powerfully built, solidly built, as strong as a horse/ox, muscular, sinewy, rugged, hardy, strapping, brawny, burly, husky
Poor performance foreseeable? • Simulate modeling strategy • Small sample size • 38 events at development • 35 events vs >2000 at validation • Stepwise selection • Univariate and multivariable statistical testing • Dichotomization • New cohort: n=19,866; 2,051 mutations
Poor calibration Poor discrimination
Could poor performance be prevented? • PREMM modeling strategy • Coding of family history • Continuous age
Could poor performance be prevented? • PREMM modeling strategy • Coding of family history • Continuous age • Larger sample size
Better discrimination and calibration if a) more sensible modeling and b) larger sample size
Substantially better decision-making if a) more sensible modeling and b) larger sample size
Discussion • Avoid stepwise selection • Prespecification with summary variables • Advanced estimation • Avoid dichotomization • Keep continuous • Increase sample size • Combining development and validation sets • Collaborative efforts • Rigorous validation • Statistical and decision-analytic perspective
Evaluation of decision-making • Net Benefit: “utility of the method” • Peirce, Science 1884 • Youden index: sens + spec – 1 • Net Benefit • Vickers, MDM 2006 • Weight FP:TP = H:B = odds(threshold) (Vergouwe 2003) • Decision Curve Analysis
Avoid miscalibration by overfitting • ShrinkageReduce coefficients by multiplying by s, s<1E.g.: multiply by 0.8 • PenalizationRidge regression: shrink during fittingLASSO: shrink to zero; implicit selectionElastic Net: combination of Ridge and LASSO • Machine learning ?