140 likes | 286 Views
Training pK a and logP prediction. Jozsef Szegezdi. Solutions for Cheminformatics. logP calculation models in Marvin. Three models are provided in Marvin. They share the same atom type definitions taken from. Viswanadhan, V. N., et al. J.Chem.Inf.Comput.Sci. , 1989 , 29 , 3, 163-172;.
E N D
Training pKa and logP prediction Jozsef Szegezdi Solutions for Cheminformatics
logP calculation models in Marvin Three models are provided in Marvin. They share the same atom type definitions taken from Viswanadhan, V. N., et al. J.Chem.Inf.Comput.Sci., 1989, 29, 3, 163-172; Unfortunately we can not tellin advance which model will be better for a molecule if it is not included in the training set.
Problem with logP models Frequently occuring problems of constructing logP models - logP training set size is too small - logP trainingset is unrepresentative - Specification of atom types and interactions is subjective - The number of logP parameters is restricted in order to ensure the ‘predictive power’ As a result, there will be missing interactions and atom types for the models.
2.03 -0.77 1.51 0.25 0.88 -0.31 4.57 1.29 3.77 3.00 1.28 2.62 1.48 1.19 -0.92 1.23 -3.24 1.79 -1.04 -1.76 2.85 1.46 0.16 0.15 Example for creating a local logP model 0.88
Example for creating a local logP model The logP of the molecules calculated with the standard weighted method which is shown on the figure below. The ‘principal of uniformity of nature’ would say thatother ‘OH’ containig molecules could be predicted reasonably by the standard ‘weighted’ method. Is it true? We test this with the ‘hydroquinone’ molecule.
Test of standard models The logP value of hydroquinone is 0.59. The next table summarizes the ‘logP’ errors of the standard models. Error of the standard models is relatively large. How can one improve the accuracy of the predicition? Prediction error can be reduced by creating a local model using linear regression for the 25 molecules mentioned above. Command line call for creating the local model: cxcalc -T logP -t LOGP –o logPparameters.txt training25.sdf
User’s model The logP value of 25 molecules containing ‘OH’ groups calculated with the ‘user defined’ method after logP training on the figure below. Comparision of the standard and the user model The user-trained local model based on 25 molecules outperforms all of the standard models.
Conclusions The local model based on 25 molecules is more accurate than any of the standard global models. Depending on the training set different parameter values will be assigned to the same atom type. This is one of the main characteristics of the user model. A ‘carefully’ created set of local models must be superior to any ‘large’ model. We plan to develop a model that combines many local models.
4.30 2.49 10.28 5.10 Apparent pKa and ionization%-pH curve The ionization % -pH curve denoted with blue color for basic centers and with red color for acidic centers.
Method for predicting pKa and training • Marvin’s prediction model considers: • partial charges • polarizability • effect of ionizable centers on each others • Training refines the existing parameters for ionizable centers and at the same time creates new modifier parameters based on structures and experimental values specified by the user.
Example for training pKa prediction 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Curating experimental pKa data The input ‘sdf’ file may be created in IJC The teaching can be run using this command line : cxcalc –T pka –o c:/output InputpKadata.sdf
Conclusions • User defined pKa model is more accurate then the built-in default model. • IJC can be used for curating input data for the training. • The new model is only a refinement of the default model, so the training assumes a robust base model that is provided in Marvin.