320 likes | 641 Views
Validation of predictive regression models. Ewout W. Steyerberg, PhD Clinical epidemiologist Frank E. Harrell, PhD Biostatistician. Personal background. Ewout Steyerberg: Erasmus MC, Rotterdam, the Netherlands
E N D
Validation of predictive regression models Ewout W. Steyerberg, PhD Clinical epidemiologist Frank E. Harrell, PhD Biostatistician
Personal background • Ewout Steyerberg: Erasmus MC, Rotterdam, the Netherlands • Frank Harrell: Health Evaluation Sciences, Univ of Virginia, Charlottesville, VA, USA “Validation of predictions from regression models is of paramount importance”
Learning objectives: knowledge of • common types of regression models • fundamental assumptions of regression models • performance criteria of predictive models • principles of different types of validation
Performance objectives • To be able to explain why validation is necessary for predictive models • To be able to judge the adequacy of a validation procedure
Predictive models provide quantitative estimates of an outcome, e.g. • Quality of life one year after surgery • Death at 30 days after surgery • Long term survival
Predictive models are often based on regression analysis • y ~ a + sum(bi*xi) y: outcome variable a: intercept bi: regression coefficient i xi: predictor variable i i in [1,many], usually 2 to 20
3 examples of regression • Quality of life one year after surgery: continuous outcome, linear regression • Death at 30 days after surgery: binary outcome, logistic regression • Long term survival: time-to-outcome, Cox regression
Predictive models make assumptions • Distribution • Linearity of continuous variables • Additivity of effects
Example: a simple logistic regression model • 30day mortality ~ a + b1*sex + b2*age Assumptions: • Distribution of 30day mortality is binomial • Age has a linear effect • The effects of sex and age can be added
Assessing model assumptions • Examine model residuals • Perform specific tests • add nonlinear terms, e.g. age+age2 • add interaction terms, e.g. sex*age
Model assumptions and predictions • Better predictions if assumptions are met • Some violation inherent in empirical data • Evaluate predictions in new data
Evaluation of predictions • Calibration • average of predictions correct? • low and high predictions correct? • Discrimination • distinguish low risk from high risk patients?
3 types of validation • Apparent: performance on sample used to develop model • Internal: performance on population underlying the sample • External: performance on related but slightly different population
Apparent validity • Easy to calculate • Results in optimistic performance estimates
Apparent estimates optimistic since same data used for: • Definition of model structure: e.g. selection and coding of variables • Estimation of model parameters: e.g. regression coefficients • Evaluation of model performance: e.g. calibration and discrimination
Internal validity • More difficult to calculate • Test model in new data, random from underlying population
Why internal validation? • Honest estimate of performance should be obtained, at least for a population similar to the development sample • Internal validated performance sets an upper limit to what may be expected in other settings (external validity)
External validity • Moderately easy to calculate when new data are available • Test model in new data, different from development population
Why external validation? • Various factors may differ from development population, including • different selection of patients • different definitions of variables • different diagnostic or therapeutic procedures
Internal validation techniques • Split-sample: • development / validation • Cross-validation: • alternating development / validation • extreme: n-1 develop / 1 validate (‘jack-knife’) • Bootstrap
Bootstrap is the preferred internal validation technique • bootstrap sample for model development: n patients drawn with replacement • original sample for validation: n patients • difference: optimism • efficiency: development and validation on n patients
Example: bootstrap results for logistic regression model • 30-day mortality ~ a + b1*sex + b2*age Apparent area under the ROC curve: 0.77 Mean area of 200 bootstrap samples:0.772 Mean area of 200 tests in original: 0.762 Optimism in apparent performance: 0.01 Optimism-corrected area: 0.76
External validation techniques • Temporal validation: same investigators, validate in recent years • Spatial validation (other place): same investigators, cross-validate in centers • Fully external: other investigators, other centers
Example: external validity of logistic regression model • 30-day mortality ~ a + b1*sex + b2*age Apparent area in 785 patients: 0.77 Tested in 20,318 other patients: 0.74 Tested by other investigators: ?
Summary • Apparent validity gives an optimistic estimate of model performance • Internal validity may be estimated by bootstrapping • External validity should be determined in other populations
Key references • tutorial and book on multivariable models(Harrell 1996, Stat Med 15:361-87; Harrell: regression modeling strategies, Springer 2001) • empirical evaluations of strategies(Steyerberg 2000: Stat Med19: 1059-79) • internal validation (Steyerberg 2001:JCE 54: 774-81) • external validation (Justice 1999: Ann Intern Med 130:515-24; Altman 2000: Stat Med 19: 453-73)
Links • Interactive text book on predictive modelinghttp://www.neri.org/symptom/mockup/Chapter_8/ • Harrell’s Regression modeling strategieshttp://hesweb1.med.virginia.edu/biostat/rms/