1 / 13

An Atomic Four-Body Potential for the Prediction of Protein-Ligand Binding Affinity

An Atomic Four-Body Potential for the Prediction of Protein-Ligand Binding Affinity. Majid Masso School of Systems Biology, George Mason University Manassas, Virginia 20110, USA CSBW – BIBM 2012, Philadelphia, Pennsylvania. Knowledge-Based Potentials of Mean Force.

jaxon
Download Presentation

An Atomic Four-Body Potential for the Prediction of Protein-Ligand Binding Affinity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Atomic Four-Body Potential for the Prediction of Protein-Ligand Binding Affinity Majid Masso School of Systems Biology, George Mason University Manassas, Virginia 20110, USA CSBW – BIBM 2012, Philadelphia, Pennsylvania

  2. Knowledge-Based Potentials of Mean Force • Generated via statistical analysis of observed features in a diverse training set of structures selected from the PDB • Alternative to physics or molecular mechanics energy functions • Assumption: observed features follow a Boltzmann distribution • Examples: • Well-documented in the literature: distance-dependent pairwise interactions at the atomic or amino acid level • This study: inclusion of higher-order contributions by developing an all-atom four-body statistical potential • Motivation (our prior work): • Four-body protein potential at the amino acid level

  3. Motivational Example:Pairwise Amino Acid Potential • The 20-letter protein alphabet yields 210 residue pairs • Obtain a diverse PDB training set of single protein chains; represent each protein as a set of amino acid points in 3D • For each residue pair (i, j), calculate the relative frequency fij with which they appear within a given distance (e.g., 12 angstroms) of each other in all the protein structures • Calculate a rate pij expected by chance alone by using a background or reference distribution (more later…) • Apply inverted Bolzmann principle: sij = log(fij / pij) quantifies interaction propensity and is proportional to the energy of interaction (by a factor of ‘–RT’) for the pair

  4. All-Atom Four-Body Statistical Potential • Diverse PDB training set of 1417 single chain and multimeric proteins, many complexed to ligands (see paper for text file) • Six-letter atomic alphabet: C, N, O, S, M (metals), X (other) • Apply Delaunay tessellation to the atomic point coordinates of each PDB file – objectively identifies all nearest-neighbor quadruplets of atoms in the structure (8 angstrom cutoff)

  5. All-Atom Four-Body Statistical Potential • The six-letter atomic alphabet yields 126 distinct quadruplets • Calculate observed rate fijklof quad (i, j, k, l) occurrence among all tetrahedra from the 1417 structure tessellations • Compute rate pijkl expected by chance from a multinomial reference distribution: • an = proportion of atoms from all structures that are of type n • tn = number of occurrences of atom type n in the quad

  6. Summary Data for the 1417 Structure Files and their Delaunay Tessellations

  7. All-Atom Four-Body Statistical Potential

  8. Topological Score (TS) • Delaunay tessellation of any macromolecular structure yields an aggregate of tetrahedral simplices • Each simplex can be scored using the all-atom four-body potential based on the quad present at the four vertices • Topological score (or ‘total potential’) of the structure: sum the scores of all constituent simplices in tessellation sijkl TS = Σsijkl

  9. Topological Score Difference (ΔTS)

  10. Application of ΔTS: Predicting Protein – Ligand Binding Energy • MOAD – repository of exp. dissociation constants (kd) for protein–ligand complexes whose structures are in PDB • Collected kd values for 300 complexes reflecting diverse protein structures • Obtained exp. binding energy from kd via ΔGexp= –RTln(kd) • Calculated ΔTS for complexes

  11. Predicting Protein – Ligand Binding Energy • Randomly selected 200 complexes to train a model • Correlation coefficient r = 0.79 between ΔTS and ΔGexp • Empirical linear transformation of ΔTS to reflect energy values: ΔGcalc = L (ΔTS) • Linear => same r = 0.79value between ΔGcalc and ΔGexp • Also, standard error of SE = 1.98 kcal/mol and fitted regression line of y = 1.18x (y = ΔGcalc and x = ΔGexp)

  12. Predicting Protein – Ligand Binding Energy • For the test set of 100 remaining complexes: • r = 0.79 between ΔGcalc and ΔGexp • SE = 1.93 kcal/mol • Fitted regression line is y = 1.11x – 0.63 • All training/test data is available online as a text file (see paper)

  13. References and Acknowledgments • PDB (structure DB): http://www.rcsb.org/pdb • MOAD (ligand binding DB): http://bindingmoad.org/ • Qhull (Delaunay tessellation): http://www.qhull.org/ • UCSF Chimera (ribbon/ball-stick structure visualization): http://www.cgl.ucsf.edu/chimera/ • Matlab (tessellation visualization): http://www.mathworks.com/products/matlab/

More Related