210 likes | 406 Views
Development of Novel Geometrical Chemical Descriptors and Their Application to the Prediction of Ligand-Protein Binding Affinity. Shuxing Zhang, Alexander Golbraikh and Alex Tropsha The Laboratory for Molecular Modeling School of Pharmacy University of North Carolina at Chapel Hill
E N D
Development of Novel Geometrical Chemical Descriptors and Their Application to the Prediction of Ligand-Protein Binding Affinity Shuxing Zhang, Alexander Golbraikh and Alex Tropsha The Laboratory for Molecular Modeling School of Pharmacy University of North Carolina at Chapel Hill November 28, 2014
Problem Given a protein-ligand complex, predict ligand binding affinity.
Knowledge-based (Statistical) Potentials • Two Body potentialsPMF Muegge, I.; Martin, Y.C.; J.Med.Chem.1999, 42, 791-804BLEEPMitchell, J.B.; Laskowski R.A.; Alex A.; Thornton, J.M.; J. Comp. Chem. 1999, 20,1165-1176DrugScoreGohlke, H.; Hendlich, M.; Klebe,G.; J Mol Biol 2000, 295, 337-356 SMoGDeWitte, R. S.; Shakhnovich, E.I. J Am. Chem. Soc. 1996, 118,11733-11744SMoG2001Ishchenko. A. V.; Shakhnovich, E. I.; J. Med. Chem. 2002, 45, 2770-2780 • Four-Body contact potential (By Jun Feng)
Full Atom-based Delaunay tessellation of Protein-ligand Interface (5HVP)
Three Types of Tetrahedra at Protein-ligand Interface RLLL RRLL RRRL RRRL: Formed by 3 receptor atoms and 1 ligand atoms RRLL: Formed by 2 receptor atoms and 2 ligand atoms RLLL: Formed by 1 receptor atoms and 3 ligand atoms
Earlier work: Four-Body Statistical Contact Scoring Function Based on Delaunay Tessellation
Correlation between experimental and calculated binding free energy for PMF dataset using four-body scoring function
Multiple CG descriptors of protein-ligand interface and correlation with ligand affinity • Define the ligand-receptor interface by the means of DT • Calculate chemical descriptors for nearest neighbor atom quadruplets. • Use statistical data modeling approach to correlate descriptors and affinity
Descriptors derived from atomic electronegativity µ: Electronegativity (chemical potentials) of atoms Q: Partial charges on atoms Η: Hardness kernel
Atom Type Definition based on En values There are 554 possible interfacial quadruplet composition types. After processing 517 complexes, 100 are found to occur with high frequency (at least 50 times).
2.5 C_R 3.0 N_R 2.4 S_L O_L 3.4 Descriptor Calculation m: m-th tetrahedral composition type j: Vertex of a tetradedron n: Number of m-th composition type Thus, there are 100 descriptors for each protein-ligand complex
Flowchart of Novel Descriptor Generation Process files and assign atom type based on EN value Define interaction interface with DT and record all interfacial tetrahedra 264 complexes Classify interfacial tetrahedra into different composition types and calculate their EN values (Descriptors) Correlate with Binding
^ {Binding affinity} = K{descriptor diversity} Structure Binding CG Descriptors Comp.1 Value1 D1 D2 D3 D4 Comp.2 Value2 " " " " Comp.3 Value3 " " " " - - - - - - - - - - - - - - Comp.N-264 Value264 " " " " Data Modeling Goal: Establish correlations between descriptors and the binding affinity capable of predicting binding of novel complexes
Data Modeling Workflow Y-Randomization Multiple Training Sets Variable Selection kNN to build models Split 240 into Training and Test Sets 264 Complexes Only accept models that have a q2 > 0.6 R2 > 0.6, etc. Multiple Test Sets Binding Prediction Randomly Exclude 24 Complexes as External Set Validate Predictive Models with Randomly Selected External Sets (24)
k Nearest Neighbor (kNN) with Variable Selection N times Leave out one complex from the training set and calculate distance between the eliminated and all remaining compounds (in the original 100 descriptor space) Randomly select a subset of descriptors (a hypothetical descriptor pharmacophore) Leave out a complex SA N times Find k nearest neighbors in the training set Predict the binding affinity of the eliminated complex by weighted kNN using the identified k nearest neighbors. Select acceptable models (with q2 > 0.6) Calculate the predictive ability (q2) of the model
Correlation of Actual ~ Predicted Binding Affinity for 49 Test Set Complexes
Correlation of Actual ~ Predicted Binding Affinity for 24 Complexes with Best Model
Conclusions • Novel geometrical chemical descriptors have been developed • These simple yet fundamental descriptors can be used to predict binding affinity using correlation approaches; have high prediction power for diverse ligand-protein structures • The statistical models can be used for fast and accurate scoring of complexes resulting from docking studies