1 / 21

C. Gomes , H. Noçairi, M. Thomas

Alternative approaches for skin sensitization evaluation: Statistical and integrated approach for the combination of non animal methods. C. Gomes , H. Noçairi, M. Thomas. ESTIV2012. Overview. Introduction/Context Specific methodology Visualization of the methodology

nuala
Download Presentation

C. Gomes , H. Noçairi, M. Thomas

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Alternative approaches for skin sensitization evaluation: Statistical and integrated approach for the combination of non animal methods C. Gomes, H. Noçairi, M. Thomas ESTIV2012

  2. Overview • Introduction/Context • Specific methodology • Visualization of the methodology • Process of validation rules • Data and Application • Conclusions and Perspectives ESTIV2012

  3. Statutory Context • L'Oreal is developing approaches for alternative safety evaluation for skin sensitization of ingredients by combining multiple in vitro and in silico data. • Purpose : develop a predictive model for hazard identification : Sensitizer/Non Sensitizer • Data : • For this purpose we used a full data set on 165 chemicals composed of 35 different variables, representing • the results from in silico predictions (Derek, TIMES, Toxtree), from DPRA, MUSST, Nrf-2 and PGE-2 in vitro tests as well as numerous physico-chemical experimental or calculated parameters ESTIV2012

  4. Specific Methodology Objective : Prediction of binary outcome (Sensitizer/Non Sensitizer) A large number of supervised classification models have been proposed in the Literature Which One To Choose? Bias induced by the use of one single statistical approach Solution “stacking" meta-model. ESTIV2012

  5. Specific Methodology • Choice of different models Boosting, Naïve Bayes, SVM, Sparse PLS-DA, and Expert Scoring • Small number of observations Repeated sub-sampling for variables selection ESTIV2012

  6. Visualization of the methodology INPUT VARIABLES (Qualitative and quantitative) BINARY OUTCOME (Sensitizer class (S) / Non Sensitizer class (NS) for N Subjects) 3 4 2 1 5 Naïve Bayes Score Method Sparse PLS DA Boosting SVM Robust prediction Each model provides a probability of being dangerous 1 2 3 4 5 Response Variable S NS 1 N Stacking Meta-model By Logistics PLS-DA Subjects Stacking is a combination of 5 supervised classification methods ESTIV2012

  7. Process of validation rules Step 1:Data split into Learning/ Validation set Data (N observations) Step 2:Learning set split into Q subsets Learning set (70%) Validation set (30%) 1st subsets : learning (80%) Test (20%) qth :Stacking Meta-model 1st :Stacking Meta-model    qth subsets : learning (80%) Test (20%) Step 5: Stacking model on the validation set Qth : Stacking Meta-model  Qth subsets : learning (80%) Test (20%) Step 4: Stacking model on the learning set with variables selected in step 3 Step 3:Parameterization of each models, and selection of the common variables in all subsets Global Stacking ESTIV2012

  8. Data and Application Model Boosting Model Stacking Sensitizer Conclusion 85% Inconclusive Conclusion 15% No Sensitizer Conclusion (Sensitizer No Sensitizer ) (N=67: ≥ 85% and ≤ 15%) (N=135: ≥ 85% and ≤ 15%) ESTIV2012

  9. Performances on the validation set (N = 50) • Performance comparisons on a validation set (25 Sensitizer and 25 Non Sensitizer) : • Take into account only high probabilities (≥ 85% and ≤ 15%) : Results show that stacking model has better performance than all the other models taken separately on a larger set ESTIV2012

  10. Conclusions and Perspectives • Conclusions : • The Stacking Meta-Model gives a prediction model withbetter performances for the development of alternative approaches in safetyevaluation of chemicalsthaneach of the five initial modelsseparately • This kind of alternative prediction tool will ultimately contribute to the risk assessment decision making in a Weight of Evidence approach. • Perspectives : • Implementation of another prediction model into the Stacking meta-model • Link the outputs of statistical approach with the comprehension of biological mechanisms. • Obtain a predictive model for potency evaluation of sensitizers (multi-class case ) Thanks you for your attention ESTIV2012

  11. Back up

  12. Naïve Bayes Bayes' theorem relates the conditional and marginal probabilities of stochasticeventsA and B: P(A/B)=P(B/A)P(A)P(B) 0.55 0.75 0.8 0.67 A priori Test probability 1- Test Sensitivity 0.575 0.875 0.88 0.67 0.875 0.29 x 0.8 0.5 x = 0.41 Test Specificity P(A) = 1 - P(A)= = 0.705 (1-0.75) 0.29 x 0.875 + 0.705 x (1-0.67) 0.5 x 0.8 + 0.5 x • In order to precise the probability a priori on each tests, a quality criterion ( Quality Factor) is used, based on Klimisch-like code 1,2, 3 (noted QF): • Klimisch-like « code 1 » : Reliable Results QF = 1 • Klimisch-like « code 2 » : Doubtful results QF = 0,8 • Klimisch-like « code 3 » : Not reliable Results QF = 0,2 1 0 1 1 1 1 0.2 0.705 0.5 0.41 0.29 0.59 0.5 The aim of this criterion is correcting the observed "raw“ prediction by taking into account the reliability of the test in the following way: • 0.5 + QF* (Sensitivity -0.5) • 0.5 + QF* (Specificity-0.5) • 0.5 + 0.2 x (0.75 - 0.5) = 0.55 • 0.5 + 0.2 x (0.875 - 0.5) = 0.575 • Corrected Sensitivity = • Corrected Specificity = 0.705 0.41 0.80 0.29 0.59 0.20

  13. Expert Score The score method allows, by a graphic visualization, to select important variables, and to fix thresholds. Example for qualitative variables: Score scenario for Var 1 N Parameter 1 A B A B A B Modality 1 2 3

  14. Expert Score The score method allows by a graphic visualization, to select important variables, and to fix thresholds. Example for continuous variables: Var 2 Value of Var 2 Score scenario for Var 2 Threshold B A

  15. Expert Score Choice of the Threshold : Best compromise between sensitivity and specificity 1 Table: global scores value for all subjects Sensitivity ROC curve 0 1-Specificity 1

  16. 1 … … p 1 n X Scaled Sparse PLS-DA • The PLSDA is a classification technique that combines the properties of PLS ​​regression with the power of discrimination of discriminant analysis: • PLS Solves the optimization problem : • Sparse PLS Solves the optimization problem : 1 q PLS-DA model Y = b.X t2  Regression vector Y Maximum variation Between (B) t1

  17. Boosting Boosting rules= Weaklearner 1 + …+Weaklearner i + …. + Weaklearner n

  18. Support Vector Machines(SVMs) are a set of machine learning approaches used for classification and regression, developed by Vladimir Vapnik SVM is based on the concept of decision planes that define decision boundaries. What is SVMs? How does it work?

  19. Class B Class A Example 1 : Linear SVMs • How would you classify these points using a linear discriminant function in order to minimize the error rate? x2 • Infinite number of answers! • Which one is the best? x1

  20. Class B Class A x+ x+ x- Support Vectors Example 1 : Linear SVMs • The linear discriminate function with the maximum margin is the best • The margin is defined as the maximal width that the boundary could be moved from the separating hyper plane before hitting the first data point x2 Margin “Safety zone” • Why is it the best? • Robust to outliners and thus strong generalization ability x1

  21. x 0 Example 2 : No-Linear SVMs • Datasets that are linearly separable with some noise work out great : x2 • But what are we going to do if the data set is just impossible to separate in 2 parts ? • How about… mapping data to a higher-dimensional space ? x 0

More Related