330 likes | 444 Views
COSMO - RS based predictions for the SAMPL6 logP challenge. Christoph Loschen , Jens Reinisch, Andreas Klamt. SAMPL6. Part II: Blind challenge for octanol-water partition coefficients ( logP ) of 11 drug -like molecules / fragments. SAMPL6 blind challenge. Methods used :.
E N D
COSMO-RSbasedpredictionsforthe SAMPL6 logPchallenge Christoph Loschen, Jens Reinisch, Andreas Klamt
SAMPL6 Part II: Blind challenge foroctanol-waterpartitioncoefficients (logP) of 11 drug-like molecules / fragments
SAMPL6 blind challenge Methodsused: QSPR & Machinelearning Molecular Dynamics Implicitsolvationmodels (SMD) 3D-RISM COSMO-RS
COSMO-RS In a nutshell s-surface • COSMO: implicitsolvationmodel via DFT Units s: [e/Å] • COSMO-RS: statisticalthermodynamics Intermolecular interactions are based on s surface segments. • Misfit/Electrostatic: • Hydrogen bonds: • Van der Waals: • Combinatorial/entropyterm Klamt, A. J. Phys. Chem. 1995, 99, 2224-2235. Klamt, A. WIREs: Comput. Mol. Sci. 2011, 1, 699-70.
COSMO-RS s-profile p(s): a histogram of charged surface segments of a molecule +
COSMO-RS The s-potential ms(s)is a characteristic function of a system at a given T. From ms(s)one obtains the chemical potential of a substance in solution ms and all related properties: * F. Eckert and A. Klamt, AIChE Journal, 48 (2002) 369-385; A. Klamt and F. Eckert, Fluid Phase Equilibria, 172 (2000) 43-72; A. Klamt, V. Jonas, T. Buerger and J. C. W. Lohrenz, J. Phys. Chem. A, 102 (1998) 5074; A. Klamt, J. Phys. Chem., 99 (1995) 2224.
COSMO-RS Property prediction via equilibriumchemicalpotentials Activitycoefficient: Solubility: Partition coefficient: : pseudo-chemical potential (Naim, A. B. Solvation Thermodynamics; Plenum Press: New York, NY, 1987.)
COSMOthermworkflow • Liquid Phase Statistical Thermodynamics COSMO-RS: • COSMOtherm • Conformationalsampling • in theliquidphase: • COSMOconf1,2 • COSMO-RS basedcomputationofchemicalpotentials in water & octanol4-6 • Considerationofconformationaleffects • Assumingwetoctanol (27.4% mf water) • Parameterization: BP_TZVPD_FINE_19 • Initial 3D Structuregeneration • Iterative conformerreductionaccordingtoenergyandclusteringofstructuresand (liquid) chemicalpotentials • COSMO-Levels used: BP/SV(P) , BP/TZVP & BP/TZVPD (Turbomole v7.3)3 • Identificationof optimal conformationalset COSMOconf 4.3; COSMOlogic GmbH & Co. KG; http://www.cosmologic.de: Leverkusen, Germany, 2018. Klamt, A.; Eckert, F.; Diedenhofen, J. Phys. Chem. B 2009, 113 (14), 4508–4510. TURBOMOLE V7.3; University of Karlsruhe and Forschungszentrum Karlsruhe GmbH, 1989-2007, TURBOMOLE GmbH, since 2007; availablefrom http://www.turbomole.com: Karlsruhe, Germany, 2018. Klamt, A. J. Phys. Chem. 1995, 99 (7), 2224–2235. Klamt, A.; Jonas, V.; Bürger, T.; Lohrenz, J. C. J. Phys. Chem. A 1998, 102 (26), 5074–5085. COSMOtherm, Release 19; COSMOlogic GmbH & Co. KG; http://www.cosmologic.de: Leverkusen, Germany, 2019.
Conformations COSMOconf workflow overview 3D structure and initial conformer generation Clustering by structure and conformer reduction by relative energies BP/TZVP COSMO optimization (TM) Clustering by chemical potential and conformer reduction BP/TZVPD COSMO single point (TM) Relevant conformer selection via iterative chemical potential computation in a diverse solvent set
COSMOquick workflow • Generation of approximated s-surface • Empirical correction term via ML • Molecule perception • COSMOtherm property calculation • Input: SMILES, xyz, SDF • Bond perception • Standardization • Circular fingerprints • QSPR descriptors • Compute s-potentials • Liquid phase properties: chemical potentials, activity coefficients, partition coefficients etc. • Database of about 200,000 COSMO files • Use fingerprints to find best sub-structure matches in database • Construct COSMO surfaces • Deviations from experiment are fitted via machine learning • For new compounds correction term is predicted • Model trained via XGBoost library2 COSMOquick1.7; COSMOlogic GmbH & Co. KG; http://www.cosmologic.de: Leverkusen, Germany, 2018. Chen, T.; Guestrin, C. Xgboost: A ScalableTreeBoosting System. In Proceedingsofthe 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; ACM, 2016; pp 785–794.
COSMOquick: instant s-profile composition cheminformatics backend Idea: Compose larger molecules from a database of pre-calculated molecules* Assumption: s-profiles of compounds are somewhat additive! *Hornig, M. & Klamt, A. J ChemInf Model, 2005, 45, 1169-1177.
COSMOquick: instant s-profile composition Example: SM07 Atomicweightstrings: w={000000000000000010000011111110000000000001001111111000} w={1111000000000011011000000010} w={00000001111000000000000000001100000000} • Compoundsare not wellrepresentedexcept SM11 • significantfragmentationeffectexpected! • Only a fractionof a secondneededfors-profile 1: lowestsimlarity (singleatommatch) 9: highestsimilarity (fullmatch – 8 shells )
COSMOquick ML correction term: COSMOtherm &approx. s XGBoost test set II results variable(i.e. descriptor) importance
Gradient TreeBoosting https://xgboost.readthedocs.io/en/latest/index.html XGBoostlibrary: State-of-the-art in machinelearningfortabulardata Implementation ofstochasticgradientboosting* Loss function, e.g. squarederror learning rate Predictedvalues decisiontree Winninglist in ML competitions: https://github.com/dmlc/xgboost/tree/master/demo#machine-learning-challenge-winning-solutions *Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38:367–378. http://dx.doi.org/10.1016/S0167-9473(01)00065-2
Final Results COSMO-RS basedmethods Accuracy (RMSD) 1st in RMSD COSMOtherm 3rd in RMSD COSMOquick - QSPR - Deep Learning - SMD - Molecular Dynamics- 3D-RISM
Results Comparisonof COSMO-RSSubmissions
Results Comparisonof COSMOthermSubmissionsvsexperiment Correlationsbetween COSMOthermbasedsubmissions: R2=0.76 SM15 (outlierCOSMOquick) : baddatabaserepresentation SM13 (Outlier COSMOtherm):?
SM15 • Most modelspredictlogPtoolow, i.e. they • overestimatethewater solubility COSMO-RS chemicalpotentials: Top 20 SM14 logPexp=1.95 • The mainreasonforthe high logPisthelowchemical potential in octanol! SM15 logPexp=3.07
ConformationalEffects Root mean squared deviation (RMSD) from SAMPL6 experimental data using different conformer generation methods. • Conformationsare Boltzmann weighted, iterative computationofchemicalpotentials • Set ofrather rigid compounds
Test setdata Andcomparisonwith SAMPL6 results • DecreaseofCOSMOquick/ML performance: • Database representationof SAMPL6 substructures • Improvementof COSMOtherm performance: • Test set (literature) data not as clean as SAMPL6 data! Need formoreaccurate experimental data!
SAMPL6 logP Summary & Learnings Need formore such high qualityexperiments! Difficultiesin findinguseful, i.e. predictivetestsets Importanceofconformationaleffects (evenforless flexible compounds) orsuitabledescriptors SAMPL6 compoundsaresomewhatunderrepresented in COSMOquicks-surfacedatabase 3 of top 4 entriesusedstochasticgradientboosting (XGBoost)
Manythanksforyourattention!contact:christoph.loschen@3ds.com
COSMOtherm’s predictivity in blind challenges 2019 SAMPL6 II: logP of drug like molecules Best contribution 2018 SAMPL6 I: pKa of drug like molecules Best contribution by Grimme et al. using TURBOMOLE and COSMOtherm 2016 SAMPL5: Water/cyclohexane partioning of 53 drug-like moleculesBest in deviation and correlation to experimental results 2014 SAMPL4: Host guest association free energy for the CB7-host Best contribution by Grimme et al. using TURBOMOLE and COSMOtherm 2010 6th IFPS: Temperature dependence of a polyether–water mixtureBest contribution 2009 SAMPL2: Blind prediction of hydration free energy Best contribution 2008 IFPS: Partitioning and activity coefficients 1st and 2nd rank 2002 IFPS: Vapor-liquid equilibria Best contribution
COSMOlogicsubmissions COSMOtherm, FINE19 parameterization (id: hmz0n) COSMOquick, version 1.7, TZVP parameterization+ ML correction (id: 3vqbi) https://github.com/MobleyLab/SAMPL6/tree/master/physical_properties/logP/analysis
Results Predictedvs experimental logP (COSMOtherm) Deviationsfromexperimentfor SM13 (10 bestpredictions) SM13