1 / 129

In silico calculation of aqueous solubility

In silico calculation of aqueous solubility. Dr John Mitchell Unilever Centre for Molecular Science Informatics Department of Chemistry University of Cambridge, U.K. In silico calculation of aqueous solubility. Dr John Mitchell Unilever Centre for Molecular Science Informatics

Download Presentation

In silico calculation of aqueous solubility

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. In silico calculation of aqueous solubility Dr John Mitchell Unilever Centre for Molecular Science Informatics Department of Chemistry University of Cambridge, U.K.

  2. In silico calculation of aqueous solubility Dr John Mitchell Unilever Centre for Molecular Science Informatics Department of Chemistry University of Cambridge, U.K.

  3. In silico calculation of aqueous solubility Dr John Mitchell Unilever Centre for Molecular Science Informatics Department of Chemistry University of Cambridge, U.K.

  4. In silico calculation of aqueous solubility Dr John Mitchell Unilever Centre for Molecular Science Informatics Department of Chemistry University of Cambridge, U.K.

  5. In silico calculation of aqueous solubility Dr John Mitchell … …soon moving to the University of St Andrews

  6. Background …

  7. Solubility Measurement …

  8. Diclofenac Supersaturated Solution 8 Intrinsic solubility values Subsaturated Solution ● First precipitation – Kinetic Solubility (Not in Equilibrium) ● Thermodynamic Solubility through “Chasing Equilibrium”- Intrinsic Solubility (In Equilibrium) Supersaturation Factor SSF = Skin – S0 In Solution Powder Random error less than 0.05 log units !!!! ●We continue “Chasing equilibrium” until a specified number of crossing points have been reached ● A crossing point represents the moment when the solution switches from a saturated solution to a subsaturated solution; no change in pH, gradient zero, no re-dissolving nor precipitating…. SOLUTION IS IN EQUILIBRIUM “CheqSol” * A. Llinàs, J. C. Burley, K. J. Box, R. C. Glen and J. M. Goodman. Diclofenac solubility: independent determination of the intrinsic solubility of three crystal forms. J. Med. Chem. 2007, 50(5), 979-983

  9. Modelling and Predicting Solubility …

  10. How should we approach the prediction/estimation/calculation of the aqueous solubility of druglike molecules? Two (apparently) fundamentally different approaches

  11. How should we approach the prediction/estimation/calculation of the aqueous solubility of druglike molecules? Two (apparently) fundamentally different approaches

  12. The Two Faces of Computational Chemistry Theoretical Chemistry Informatics

  13. Informatics “The problem is too difficult to solve using physics and chemistry, so we will design a black box to link structure and solubility”

  14. Informatics and Empirical Models • In general, Informatics methods represent phenomena mathematically, but not in a physics-based way. • Inputs and output model are based on an empirically parameterised equation or more elaborate mathematical model. • Do not attempt to simulate reality. • Usually High Throughput.

  15. Informatics and Empirical Models • In general, Informatics methods represent phenomena mathematically, but not in a physics-based way. • Inputs and output model are based on an empirically parameterised equation or more elaborate mathematical model. • Do not attempt to simulate reality. • Usually High Throughput.

  16. Informatics and Empirical Models • In general, Informatics methods represent phenomena mathematically, but not in a physics-based way. • Inputs and output model are based on an empirically parameterised equation or more elaborate mathematical model. • Do not attempt to simulate reality. • Usually High Throughput.

  17. Informatics and Empirical Models • In general, Informatics methods represent phenomena mathematically, but not in a physics-based way. • Inputs and output model are based on an empirically parameterised equation or more elaborate mathematical model. • Do not attempt to simulate reality. • Usually High Throughput.

  18. Theoretical Chemistry “The problem is difficult, but by making suitable approximations we can solve it at reasonable cost based on our understanding of physics and chemistry”

  19. Theoretical Chemistry • Calculations and simulations based on real physics. • Calculations are either quantum mechanical or use parameters derived from quantum mechanics. • Attempt to model or simulate reality. • Usually Low Throughput.

  20. Theoretical Chemistry • Calculations and simulations based on real physics. • Calculations are either quantum mechanical or use parameters derived from quantum mechanics. • Attempt to model or simulate reality. • Usually Low Throughput.

  21. Theoretical Chemistry • Calculations and simulations based on real physics. • Calculations are either quantum mechanical or use parameters derived from quantum mechanics. • Attempt to model or simulate reality. • Usually Low Throughput.

  22. Theoretical Chemistry • Calculations and simulations based on real physics. • Calculations are either quantum mechanical or use parameters derived from quantum mechanics. • Attempt to model or simulate reality. • Usually Low Throughput.

  23. Our Methods … (1) Random Forest (informatics)

  24. Our Random Forest Model … We want to construct a model that will predict solubility for druglike molecules … We don’t expect our model either to use real physics and chemistry or to be easily interpretable … We do expect it to be fast and reasonably accurate …

  25. Our Random Forest Model … We want to construct a model that will predict solubility for druglike molecules … We don’t expect our model either to use real physics and chemistry or to be easily interpretable … We do expect it to be fast and reasonably accurate …

  26. Our Random Forest Model … We want to construct a model that will predict solubility for druglike molecules … We don’t expect our model either to use real physics and chemistry or to be easily interpretable … We do expect it to be fast and reasonably accurate …

  27. Our Random Forest Model … We want to construct a model that will predict solubility for druglike molecules … We don’t expect our model either to use real physics and chemistry or to be easily interpretable … We do expect it to be fast and reasonably accurate …

  28. Random Forest Machine Learning Method

  29. Random Forest for Solubility Prediction A Forest of Regression Trees • Dataset is partitioned into consecutively • smaller subsets (of similar solubility) • Each partition is based upon the value of • one descriptor • The descriptor used at each split is • selected so as to minimise the MSE Leo Breiman, "Random Forests“, Machine Learning 45, 5-32 (2001).

  30. Random Forest for Predicting Solubility • A Forest of Regression Trees • Each tree grown until terminal nodes contain specified number of molecules • No need to prune back • High predictive accuracy • Includes descriptor selection • No training problems – largely immune from overfitting • “Out-of-bag” validation – using those molecules not in the bootstrap samples.

  31. Dataset • Literature Data • Compiled from Huuskonen dataset and AquaSol database – pharmaceutically relevant molecules • All molecules solid at room temperature • n = 988 molecules • Training = 658 molecules • Test = 330 molecules • MOE descriptors 2D/3D ●Aqueous solubility – the thermodynamic solubility in unbuffered water (at 25oC)

  32. Dataset • Literature Data • Compiled from Huuskonen dataset and AquaSol database – pharmaceutically relevant molecules • All molecules solid at room temperature • n = 988 molecules • Training = 658 molecules • Test = 330 molecules • MOE descriptors 2D/3D ●Aqueous solubility – the thermodynamic solubility in unbuffered water (at 25oC)

  33. Dataset • Literature Data • Compiled from Huuskonen dataset and AquaSol database – pharmaceutically relevant molecules • All molecules solid at room temperature • n = 988 molecules • Training = 658 molecules • Test = 330 molecules • MOE descriptors 2D/3D ●Aqueous solubility – the thermodynamic solubility in unbuffered water (at 25oC)

  34. Dataset • Literature Data • Compiled from Huuskonen dataset and AquaSol database – pharmaceutically relevant molecules • All molecules solid at room temperature • n = 988 molecules • Training = 658 molecules • Test = 330 molecules • MOE descriptors 2D/3D ●Aqueous solubility – the thermodynamic solubility in unbuffered water (at 25oC)

  35. Dataset • Literature Data • Compiled from Huuskonen dataset and AquaSol database – pharmaceutically relevant molecules • All molecules solid at room temperature • n = 988 molecules • Training = 658 molecules • Test = 330 molecules • MOE descriptors 2D/3D ●Aqueous solubility – the thermodynamic solubility in unbuffered water (at 25oC)

  36. Dataset • Literature Data • Compiled from Huuskonen dataset and AquaSol database – pharmaceutically relevant molecules • All molecules solid at room temperature • n = 988 molecules • Training = 658 molecules • Test = 330 molecules • MOE descriptors 2D/3D ●Aqueous solubility – the thermodynamic solubility in unbuffered water (at 25oC)

  37. Dataset • Literature Data • Compiled from Huuskonen dataset and AquaSol database – pharmaceutically relevant molecules • All molecules solid at room temperature • n = 988 molecules • Training = 658 molecules • Test = 330 molecules • MOE descriptors 2D/3D ●Aqueous solubility – the thermodynamic solubility in unbuffered water (at 25oC)

  38. Dataset • Literature Data • Compiled from Huuskonen dataset and AquaSol database – pharmaceutically relevant molecules • All molecules solid at room temperature • n = 988 molecules • Training = 658 molecules • Test = 330 molecules • MOE descriptors 2D/3D Datasets compiled from diverse literature data may have significant random and systematic errors.

  39. Random Forest: Solubility Results RMSE(oob)=0.68 r2(oob)=0.90 Bias(oob)=0.01 RMSE(te)=0.69 r2(te)=0.89 Bias(te)=-0.04 RMSE(tr)=0.27 r2(tr)=0.98 Bias(tr)=0.005 DS Palmer et al., J. Chem. Inf. Model., 47, 150-158 (2007)

  40. Random Forest: Solubility Results “Out of bag” RMSE(oob)=0.68 r2(oob)=0.90 Bias(oob)=0.01 RMSE(te)=0.69 r2(te)=0.89 Bias(te)=-0.04 RMSE(tr)=0.27 r2(tr)=0.98 Bias(tr)=0.005 DS Palmer et al., J. Chem. Inf. Model., 47, 150-158 (2007)

  41. Random Forest: Solubility Results Test set RMSE(oob)=0.68 r2(oob)=0.90 Bias(oob)=0.01 RMSE(te)=0.69 r2(te)=0.89 Bias(te)=-0.04 RMSE(tr)=0.27 r2(tr)=0.98 Bias(tr)=0.005 DS Palmer et al., J. Chem. Inf. Model., 47, 150-158 (2007)

  42. These results are competitive with any other informatics or QSPR solubility prediction method RMSE(oob)=0.68 r2(oob)=0.90 Bias(oob)=0.01 RMSE(te)=0.69 r2(te)=0.89 Bias(te)=-0.04 RMSE(tr)=0.27 r2(tr)=0.98 Bias(tr)=0.005 DS Palmer et al., J. Chem. Inf. Model., 47, 150-158 (2007)

  43. References

  44. Our Methods … (2) Thermodynamic Cycle (A hybrid of theoretical chemistry & informatics)

  45. Our Thermodynamic Cycle method … We want to construct a theoretical model that will predict solubility for druglike molecules … We expect our model to use real physics and chemistry and to give some insight … We may need to include some empirical parameters… We don’t expect it to be fast by informatics or QSPR standards, but it should be reasonably accurate …

  46. Our Thermodynamic Cycle method … We want to construct a theoretical model that will predict solubility for druglike molecules … We expect our model to use real physics and chemistry and to give some insight … We may need to include some empirical parameters… We don’t expect it to be fast by informatics or QSPR standards, but it should be reasonably accurate …

  47. Our Thermodynamic Cycle method … We want to construct a theoretical model that will predict solubility for druglike molecules … We expect our model to use real physics and chemistry and to give some insight … We may need to include some empirical parameters… We don’t expect it to be fast by informatics or QSPR standards, but it should be reasonably accurate …

  48. Our Thermodynamic Cycle method … We want to construct a theoretical model that will predict solubility for druglike molecules … We expect our model to use real physics and chemistry and to give some insight … We may need to include some empirical parameters… We don’t expect it to be fast by informatics or QSPR standards, but it should be reasonably accurate …

  49. Our Thermodynamic Cycle method … We want to construct a theoretical model that will predict solubility for druglike molecules … We expect our model to use real physics and chemistry and to give some insight … We may need to include some empirical parameters… We don’t expect it to be fast by informatics or QSPR standards, but it should be reasonably accurate …

  50. For this study we (Toni Llinàs) measured 30 solubilities using the CheqSol method and took another 30 from other high quality studies (Bergstrom & Rytting). We use a Sirius glpKa instrument

More Related