1 / 27

COSMO - RS based predictions for the SAMPL6 logP challenge

COSMO - RS based predictions for the SAMPL6 logP challenge. Christoph Loschen , Jens Reinisch, Andreas Klamt. SAMPL6. Part II: Blind challenge for octanol-water partition coefficients ( logP ) of 11 drug -like molecules / fragments. SAMPL6 blind challenge. Methods used :.

mclay
Download Presentation

COSMO - RS based predictions for the SAMPL6 logP challenge

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. COSMO-RSbasedpredictionsforthe SAMPL6 logPchallenge Christoph Loschen, Jens Reinisch, Andreas Klamt

  2. SAMPL6 Part II: Blind challenge foroctanol-waterpartitioncoefficients (logP) of 11 drug-like molecules / fragments

  3. SAMPL6 blind challenge Methodsused: QSPR & Machinelearning Molecular Dynamics Implicitsolvationmodels (SMD) 3D-RISM COSMO-RS

  4. COSMO-RS In a nutshell s-surface • COSMO: implicitsolvationmodel via DFT Units s: [e/Å] • COSMO-RS: statisticalthermodynamics Intermolecular interactions are based on s surface segments. • Misfit/Electrostatic: • Hydrogen bonds: • Van der Waals: • Combinatorial/entropyterm Klamt, A. J. Phys. Chem. 1995, 99, 2224-2235. Klamt, A. WIREs: Comput. Mol. Sci. 2011, 1, 699-70.

  5. COSMO-RS s-profile p(s): a histogram of charged surface segments of a molecule +

  6. COSMO-RS The s-potential ms(s)is a characteristic function of a system at a given T. From ms(s)one obtains the chemical potential of a substance in solution ms and all related properties: * F. Eckert and A. Klamt, AIChE Journal, 48 (2002) 369-385; A. Klamt and F. Eckert, Fluid Phase Equilibria, 172 (2000) 43-72; A. Klamt, V. Jonas, T. Buerger and J. C. W. Lohrenz, J. Phys. Chem. A, 102 (1998) 5074; A. Klamt, J. Phys. Chem., 99 (1995) 2224.

  7. COSMO-RS Property prediction via equilibriumchemicalpotentials Activitycoefficient: Solubility: Partition coefficient: : pseudo-chemical potential (Naim, A. B. Solvation Thermodynamics; Plenum Press: New York, NY, 1987.)

  8. COSMOthermworkflow • Liquid Phase Statistical Thermodynamics COSMO-RS: • COSMOtherm • Conformationalsampling • in theliquidphase: • COSMOconf1,2 • COSMO-RS basedcomputationofchemicalpotentials in water & octanol4-6 • Considerationofconformationaleffects • Assumingwetoctanol (27.4% mf water) • Parameterization: BP_TZVPD_FINE_19 • Initial 3D Structuregeneration • Iterative conformerreductionaccordingtoenergyandclusteringofstructuresand (liquid) chemicalpotentials • COSMO-Levels used: BP/SV(P) , BP/TZVP & BP/TZVPD (Turbomole v7.3)3 • Identificationof optimal conformationalset COSMOconf 4.3; COSMOlogic GmbH & Co. KG; http://www.cosmologic.de: Leverkusen, Germany, 2018. Klamt, A.; Eckert, F.; Diedenhofen, J. Phys. Chem. B 2009, 113 (14), 4508–4510. TURBOMOLE V7.3; University of Karlsruhe and Forschungszentrum Karlsruhe GmbH, 1989-2007, TURBOMOLE GmbH, since 2007; availablefrom http://www.turbomole.com: Karlsruhe, Germany, 2018. Klamt, A. J. Phys. Chem. 1995, 99 (7), 2224–2235. Klamt, A.; Jonas, V.; Bürger, T.; Lohrenz, J. C. J. Phys. Chem. A 1998, 102 (26), 5074–5085. COSMOtherm, Release 19; COSMOlogic GmbH & Co. KG; http://www.cosmologic.de: Leverkusen, Germany, 2019.

  9. Conformations COSMOconf workflow overview 3D structure and initial conformer generation Clustering by structure and conformer reduction by relative energies BP/TZVP COSMO optimization (TM) Clustering by chemical potential and conformer reduction BP/TZVPD COSMO single point (TM) Relevant conformer selection via iterative chemical potential computation in a diverse solvent set

  10. COSMOquick workflow • Generation of approximated s-surface • Empirical correction term via ML • Molecule perception • COSMOtherm property calculation • Input: SMILES, xyz, SDF • Bond perception • Standardization • Circular fingerprints • QSPR descriptors • Compute s-potentials • Liquid phase properties: chemical potentials, activity coefficients, partition coefficients etc. • Database of about 200,000 COSMO files • Use fingerprints to find best sub-structure matches in database • Construct COSMO surfaces • Deviations from experiment are fitted via machine learning • For new compounds correction term is predicted • Model trained via XGBoost library2 COSMOquick1.7; COSMOlogic GmbH & Co. KG; http://www.cosmologic.de: Leverkusen, Germany, 2018. Chen, T.; Guestrin, C. Xgboost: A ScalableTreeBoosting System. In Proceedingsofthe 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; ACM, 2016; pp 785–794.

  11. COSMOquick: instant s-profile composition cheminformatics backend Idea: Compose larger molecules from a database of pre-calculated molecules* Assumption: s-profiles of compounds are somewhat additive! *Hornig, M. & Klamt, A. J ChemInf Model, 2005, 45, 1169-1177.

  12. COSMOquick: instant s-profile composition Example: SM07 Atomicweightstrings: w={000000000000000010000011111110000000000001001111111000} w={1111000000000011011000000010} w={00000001111000000000000000001100000000} • Compoundsare not wellrepresentedexcept SM11 • significantfragmentationeffectexpected! • Only a fractionof a secondneededfors-profile 1: lowestsimlarity (singleatommatch) 9: highestsimilarity (fullmatch – 8 shells )

  13. COSMOquick ML correction term: COSMOtherm &approx. s XGBoost test set II results variable(i.e. descriptor) importance

  14. Gradient TreeBoosting https://xgboost.readthedocs.io/en/latest/index.html XGBoostlibrary: State-of-the-art in machinelearningfortabulardata Implementation ofstochasticgradientboosting* Loss function, e.g. squarederror learning rate Predictedvalues decisiontree Winninglist in ML competitions: https://github.com/dmlc/xgboost/tree/master/demo#machine-learning-challenge-winning-solutions *Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38:367–378. http://dx.doi.org/10.1016/S0167-9473(01)00065-2

  15. Final Results COSMO-RS basedmethods Accuracy (RMSD) 1st in RMSD COSMOtherm 3rd in RMSD COSMOquick - QSPR - Deep Learning - SMD - Molecular Dynamics- 3D-RISM

  16. Results Comparisonof COSMO-RSSubmissions

  17. Results Comparisonof COSMOthermSubmissionsvsexperiment Correlationsbetween COSMOthermbasedsubmissions: R2=0.76 SM15 (outlierCOSMOquick) : baddatabaserepresentation SM13 (Outlier COSMOtherm):?

  18. SM15 • Most modelspredictlogPtoolow, i.e. they • overestimatethewater solubility COSMO-RS chemicalpotentials: Top 20 SM14 logPexp=1.95 • The mainreasonforthe high logPisthelowchemical potential in octanol! SM15 logPexp=3.07

  19. ConformationalEffects Root mean squared deviation (RMSD) from SAMPL6 experimental data using different conformer generation methods. • Conformationsare Boltzmann weighted, iterative computationofchemicalpotentials • Set ofrather rigid compounds

  20. Test setdata Andcomparisonwith SAMPL6 results • DecreaseofCOSMOquick/ML performance: • Database representationof SAMPL6 substructures • Improvementof COSMOtherm performance: • Test set (literature) data not as clean as SAMPL6 data! Need formoreaccurate experimental data!

  21. SAMPL6 logP Summary & Learnings Need formore such high qualityexperiments! Difficultiesin findinguseful, i.e. predictivetestsets Importanceofconformationaleffects (evenforless flexible compounds) orsuitabledescriptors SAMPL6 compoundsaresomewhatunderrepresented in COSMOquicks-surfacedatabase 3 of top 4 entriesusedstochasticgradientboosting (XGBoost)

  22. Manythanksforyourattention!contact:christoph.loschen@3ds.com

  23. Additional Material

  24. COSMOtherm’s predictivity in blind challenges 2019 SAMPL6 II: logP of drug like molecules Best contribution 2018 SAMPL6 I: pKa of drug like molecules Best contribution by Grimme et al. using TURBOMOLE and COSMOtherm 2016 SAMPL5: Water/cyclohexane partioning of 53 drug-like moleculesBest in deviation and correlation to experimental results 2014 SAMPL4: Host guest association free energy for the CB7-host Best contribution by Grimme et al. using TURBOMOLE and COSMOtherm 2010 6th IFPS: Temperature dependence of a polyether–water mixtureBest contribution 2009 SAMPL2: Blind prediction of hydration free energy Best contribution 2008 IFPS: Partitioning and activity coefficients 1st and 2nd rank 2002 IFPS: Vapor-liquid equilibria Best contribution

  25. COSMOlogicsubmissions COSMOtherm, FINE19 parameterization (id: hmz0n) COSMOquick, version 1.7, TZVP parameterization+ ML correction (id: 3vqbi) https://github.com/MobleyLab/SAMPL6/tree/master/physical_properties/logP/analysis

  26. Results Predictedvs experimental logP (COSMOtherm) Deviationsfromexperimentfor SM13 (10 bestpredictions) SM13

More Related