10 likes | 196 Views
QSAR Study of HIV Protease Inhibitors Using Neural Network and Genetic Algorithm. Akmal Aulia, 1 Sunil Kumar, 2 Rajni Garg, * 3 A. Srinivas Reddy, 4. Introduction. Descriptor Thinning. Results. IC 50 dataset. Total Descriptors.
E N D
QSAR Study of HIV Protease Inhibitors Using Neural Network and Genetic Algorithm Akmal Aulia,1Sunil Kumar,2Rajni Garg,* 3A. Srinivas Reddy,4 Introduction Descriptor Thinning Results IC50 dataset Total Descriptors • Linear and Non-linear regression techniques are employed to analyze a large dataset of 334 compounds of HIV protease inhibitors (Kempf et al.). • The data set was studied using MLR (Multiple Linear Regression) and ANN (Artificial Neural Network) techniques to develop QSAR (Quantitative Structure-Activity Relationship) models. • Each ligand (inhibitor or drug molecule) was described by means of physico-chemical and structural descriptors (features) which encode constitutional, electrostatic, geometrical, quantum and topological properties. • The capability of descriptors to address the variations in ligand(s) was linked to the predictive power of QSAR models. • Combined information from these models helps in 'transforming data into information and information into knowledge' from chem-informatics point of view. IC50 set: Final Descriptors EC50 dataset Materials and Methods EC50 set: Final Descriptors Research Design Reported dataset (Kempf et al.) with their experimental Biological Activity (EC50 and IC50) Lower energy conformation is obtained for each compound by means of Molecular Mechanics Minimization. A total of 277 descriptors calculated. Summary & Future Work Objective Descriptors(Matlab): IC50 dataset(reduced from 277 to 148), EC50 dataset(reduced from 277 to 157). Subjective Descriptors(WEKA/GA): IC50 dataset(reduced from 148 to 9), EC50 dataset(reduced from 157 to 7) • For the IC50 dataset, the constitutional and topological properties have the largest contribution, while for the EC50 dataset, electrostatic and topological properties are significant. • Non-linear models have better predictive capability. However, the linear models can be interpreted better mechanistically. Presence of similar descriptors in both types of models validates our results. • Further studies using other statistical and ANN based regression techniques are in progress, in order to find the best QSAR models and descriptors. • These models will serve as useful computational tools for prediction of biological activity of this class of HIV protease inhibitors. Both MLR and FNN methods were implemented in WEKA. References (1) Fernandez et al.; “Quantitative structure-activity relationship to predict differential inhibition of aldose reductase by flavonoid compounds” Bioorganic and Medicinal Chemistry, 2005, 13, 3269-3277. (2) (a)CODESSA software, Semichem Inc., USA; (b) MATLAB, The MathWorks Inc.; (c) WEKA software, the University of Waikato, New Zealand. (3) Fernandez, M. and Caballero, J.;”Linear and nonlinear modeling of antifungal activity of some heterocyclic ring derivatives using multiple linear regression and Bayesian-regularized neural networks”, J. Mol. Model., 2006, 12, 168-181 (4) Goldberg, D. E.; Genetic Algorithms in Search Optimization & Machine Learning; Addison-Wesley:Reading, MA, 2000. (5) “Data Mining: Practical Machine Learning tools and techniques”, 2nd Edition, Morgan Kaufmann, San Fransisco, 2005. 1Computational Science Research Center, San Diego State University, CA; 2ECE Dept., San Diego State University, San Diego, CA; 3Chem. Dept., California State University, San Marcos, CA; 4Molecular Modeling Group, IICT, Hyderabad, India.