1 / 34

Test Set Validation Revisited – Good Validation Practice in QSAR Knut Baumann

Test Set Validation Revisited – Good Validation Practice in QSAR Knut Baumann Department of Pharmacy, University of Würzburg, Germany. = f ( ). k. Quantitative Structure-Activity Relationships. Build mathematical model: Activity = f (Structural Properties)

tana
Download Presentation

Test Set Validation Revisited – Good Validation Practice in QSAR Knut Baumann

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Test Set Validation Revisited – Good Validation Practice in QSAR Knut Baumann Department of Pharmacy, University of Würzburg, Germany

  2. = f( ) k Quantitative Structure-Activity Relationships • Build mathematical model: Activity = f(Structural Properties) • Use it to predict activity of novel compounds

  3. Model Validation Ultimate Goal of QSAR •  Predictivity • Prerequisites: • Valid biological and structural data • Stable mathematical model • Exclusion of chance correlation and overfitting

  4. Outline • Conditions for good external predictivity • Practice of external validation

  5. Levels of Model Validity • Data fit • Internal predictivity  internal validation • External predictivity  external validation

  6. 10 9 8 7 6 5 Fitted 4 3 3 4 5 6 7 8 9 10 Observed Definition: Data Fit The same data are used to build and to assess the model  Resubstitution Error GRID-PLS R2 = 0.94 R2: squared multiple correlation coefficient Data: HEPT; n = 53

  7. 1.0 0.9 GRID-PLS 0.8 R2 / R2CV-1 max. R2CV-1 0.7 0.6 0.5 0 2 4 6 8 10 Number of PLS-Factors Fit Cross-Validation Definition: Internal Predictivity A measure of predictivity (cross-validation, validation set prediction) that is used for model selection R2CV-1: leave-one-out cross-validated squared correlation coefficient (Q2) Data: HEPT; n = 53

  8. Definition: External Predictivity A measure of predictivity (cross-validation, test set prediction) for a set of data that did not influence model selection The activity values of the test set are concealed and not known to the user during model selection

  9. GRID-PLS 1.0 max. R2Test 0.9 0.8 R2 / R2CV-1 / R2Test max. R2CV-1 0.7 Fit 0.6 Cross-Validation Test Set Prediction 0.5 0 2 4 6 8 10 Number of PLS-Factors Example: External Predictivity Data: HEPT; n = 53, nTest = 27

  10. 1.0 max. R2 0.8 0.6 R2 / R2CV-1 / R2Test 0.4 Fit 0.2 Cross-Validation Test Set Prediction 0.0 0 5 10 15 20 25 30 35 Number of PLS-Factors Importance of Selection Criterion Good external predictivity  Quality of measure of predictivity for model selection! Data: HEPT; n = 53, nTest = 27

  11. Usefulness of Internal Predictivity Do internal measures of predictivity provide useful information? It depends …

  12. CV: Test: Case 1: No Model Selection Multiple Linear Regression: R2CV-1 R2Test MSEP: Mean squared error of prediction

  13. GRID-PLS 1.0 0.9 0.8 R2CV-1 / R2Test 0.7 0.6 Cross-Validation Test Set Prediction 0.5 0 2 4 6 8 10 Number of PLS-Factors Stable mathematical modelling technique & Few models are compared Internal  External Case 2: Little Model Selection

  14. 1.0 0.8 0.6 R2CV-1 0.4 0.2 Internal 0.0 9000 18000 27000 36000 45000 0 No. Models eval. Case 3: Extensive Model Selection Here: Variable Subset Selection

  15. 1.0 max. R2CV-1 0.8 0.6 R2CV-1 /R2Test 0.4 0.2 Internal External 0.0 9000 18000 27000 36000 45000 0 No. Models eval. Case 3: Extensive Model Selection Here: Variable Subset Selection Extensive model selection  (danger of) overfitting  internal measures of predictivity are of limited usefulness Data: Steroids; n = 21, nTest = 9

  16. Outline • Conditions for good external predictivity • Practice of external validation

  17. Meaningful External Validation • The two Problems of external Validation: • Data splitting • Variability

  18. Problem 1: Data Splitting Training set Activity values Structure descriptors Test set • Techniques for splitting • Experimental design using descriptors • Random partition  biased1  variability  Use multiple random splits into training and test sets 1) E. Roecker, Technometrics1991, 33, 459-468.

  19. Problem 2: Variability nTest = 5 rel sdv(RMSEP) = 32% nTest = 10 rel sdv(RMSEP) = 22% nTest = 50 rel sdv(RMSEP) = 10% RMSEP: Root mean squared error of prediction

  20. Problem 2: Variability Example Steroid data set nTest = 9 RMSEP = 0.53  R2Test = 0.73 RMSEP  2  sdv(RMSEP) = 0.53  0.25  R2Test = [ 0.40 0.92 ] RMSEP: Root mean squared error of prediction

  21. Problem 2: Variability Until the test data set is huge (nTest  100)  Use multiple random splits into training and test sets RMSEP: Root mean squared error of prediction

  22. 1.0 0.9 0.8 0.7 R2Test 0.6 0.5 0.4 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 R2CV-1 Variability Illustrated I GRID - PLS n = 29 nTest = 15 Data: W84

  23. 1.0 0.9 0.8 0.7 R2Test 0.6 0.5 0.4 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 R2CV-1 Variability Illustrated I GRID - PLS 100 random splits into: n = 29 nTest = 15 mean Data: W84

  24. Variable Selection GRID-PLS mean mean Variability Illustrated II Influence of extensive model selection 1.0 100 random splits into: n = 29 nTest = 15 0.5 R2Test 0.0 -0.5 -1.0 -1.0 -0.5 0.0 0.5 1.0 R2CV  Extensive model selection causes instability Data: W84

  25. Financial Support German Research Foundation: SFB 630 – TP C5 Conclusion • Internal predictivity must reliably characterize model performance • Avoid extensive model selection if possible • Do not use the activity values of the test set until the final model is selected • Model selection: variation of any operational parameter • Use multiple splits into test and training set unless test set is huge knut.baumann@chemometrix.de

  26. Kubinyi-Pardoxon Explained Data: Log P

  27. Definition: Data Fit GRID-PLS 8 R2 = 0.99 7 6 Fitted 5 4 4 5 6 7 8 Observed The same data are used to build and to assess the model  Resubstitution Error Usefulness: strongly biased

  28. 8 R2 = 0.99 R2CV-1 = 0.62 7 6 Predicted 5 Fit 4 Cross-Validation 4 5 6 7 8 Observed Internal Predictivity GRID-PLS Does internal predictivity provide useful information?  It depends!

  29. Definition: Internal Predictivity GRID-PLS 1 0.8 0.6 R2 / R2CV-1 0.4 0.2 Fit Cross-Validation 0 0 2 4 6 8 10 Number of PLS-Factors A measure of predictivity (cross-validation, test set prediction) that was used for model selection Usefulness: it depends …

  30. 1 0.9 0.8 0.7 R2Test 0.6 0.5 data 26 data 27 0.4 data 28 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 R2CV-1 Variability Illustrated

  31. Conclusion • Internal figures of merit in VS are largely inflated and can, in general, not be trusted • The resulting models are far more complex than anticipated • VS is prone to chance correlation, in particular with LOO-CV and similar statistics as objective function • rigorous validation mandatory „Trau, Schau, Wem!“ – “Try before you trust” • similar in spirit to: • „The importance of being earnest“, Tropsha et al. For a PDF-reprint of the slides email to: knut.baumann@chemometrix.de

More Related