1 / 21

A Four-Body Statistical Potential For Protein Fold Recognition

A Four-Body Statistical Potential For Protein Fold Recognition. Bala Krishnamoorthy and Alex Tropsha UNC Chapel Hill. Nov 17, 2003. Four-Body Potentials. Outline. Motivation. Hypothesis. Four-body statistical potentials. Application to folding simulations.

chiko
Download Presentation

A Four-Body Statistical Potential For Protein Fold Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Four-Body Statistical Potential For Protein Fold Recognition Bala Krishnamoorthy and Alex Tropsha UNC Chapel Hill Nov 17, 2003

  2. Four-Body Potentials Outline Motivation Hypothesis Four-body statistical potentials Application to folding simulations Application to predictions from CASP5 and Livebench 6

  3. Four-Body Potentials Motivation Knowledge of protein structure is essential to understand their function(s) Number of proteins (sequences known) is growing exponentially Traditional methods for determining protein structure (X-ray crystallography, NMR etc.) do not yield quick results Need to develop statistical methods that help with protein fold recognition

  4. Four-Body Potentials Hypothesis Specific nearest neighbor residue contacts in protein structures have non-random propensities for occurrence. The propensities of occurrence of nearest neighbor clusters can be used to score compatibility between protein sequence and structure

  5. Four-Body Potentials SNAPP Simplicial Neighborhood Analysis of Protein Packing 3-D Packing 2-D Packing 2-D:3 neighbors in mutual contact 3-D: 4 neighbor clusters

  6. Four-Body Potentials Objective definition of the nearest neighborhood of each residue is needed Use the Voronoi diagram of the protein • gives convex hulls around each residue (represented as a point) that define the nearest neighborhood of the residue Delaunay triangulation – defined as the dual of the Voronoi diagram

  7. Four-Body Potentials Tessellation of protein structure (in 3D) Residues are represented by their side-chain centers (or by their C-α atoms) Protein structure represented as an aggregate of space filling, non-intersecting and irregular tetrahedra Nearest neighbor residues are identified as unique sets of four residues each (tetrahedral quadruplets)

  8. Four-Body Potentials Four-body Statistical Potentials Denote each quadruplet by { i , j , k , l } i,j,k and l can be any of the 20 amino acids Total number of possible quadruplets is 8855 AALV VALI TLKM YYYY …

  9. Four-Body Potentials Based on the back-bone connectivity of {i,j,k,l}, there can be five types of tetrahedra (indexed as 0,1,2,3 and 4 respectively ) The propensities of the {i,j,k,l} quadruplets of each type t could be used to develop four-body statistical potentials

  10. Four-Body Potentials f q ijkl_t = log p ijkl_t ijkl_t a p p = C a a a t i j k l ijkl_t Four-body compositional propensities of Delaunay simplices f - observed frequency of occurrence in the training set of quad {ijkl} in a type t tetrahedron ijkl_t p • expected frequency of occurrence in the training set of • residues i,j,k and l in a type t tetrahedron ijkl_t a – individual AA frequency i p – frequency of type t tetrahedra t C – combinatorial factor

  11. Four-Body Potentials diverse training set of 1166 protein chains with known structure For a test conformation, the total log-likelihood score is calculated by adding the score for each tetrahedron in its Delaunay tessellation. Higher Score ↔ better structure

  12. Four-Body Potentials MD Simulation of proteins Comparison of pre- and post-TS (transition) structure of CI2 vs. native CI2 * Pre-TS (six structures) Post-TS (20 structures) Native Go potentials (native structure specific) fail to discriminate between the three! *structures courtesy of Dr. E. Shaknovich, Harvard (Ref: J. Mol. Biol. 296 (2000) p1183-1188)

  13. Four-Body Potentials Comparison of total scores for pre- and post-TS structures of CI2 vs. native CI2 N.B. - The 5th pre-TS instance actually had a 0.10 probability of folding (the other five pre-TS structures had ~ 0 probability of folding)

  14. Four-Body Potentials Four-Body Potentials L49 I20 V47 I57 A16 V51 L8 V13 I29 V31 V13 V31 V51 L49 Structure profiles of pre-TS vs. post-TS structure of CI2 Profile ProCAM of Post-TS structure

  15. Four-Body Potentials SNAPP analysis of pre-TS vs. post-TS structure of CI2 Pre-TS Post-TS

  16. Four-Body Potentials I48 A37 F18 L16 Y8 W35 Y52 G46 Structure profiles of pre-TS vs. post-TS structure of SH3

  17. Four-Body Potentials Scoring Livebench 6 and CASP5 predictions Livebench Automated evaluation of structure prediction servers Set 6 had 32 “easy” and 66 “hard” targets CASP 5 3D coordinate models submitted for 56 targets Native structure of 33 targets has been released - rank 3D predictions using four-body potentials - compare with the ranking using global structural similarity measures (like MaxSub)

  18. Four-Body Potentials To compare rankings, use predictive index (PI) Here, E – experimental values, P – predicted values

  19. Four-Body Potentials Livebench 6 10 models for each target made by PMODELLER PI for 28 “easy” targets and 38 “hard” targets (at least one model had a non-zero MaxSub score)

  20. Four-Body Potentials CASP 5 For 18 targets (out of 33), the native structure ranked better than allpredictions For 26 (out of 33) targets, the native structure was ranked within the top 3.5 % of all the predictions

  21. Four-Body Potentials Conclusions A four-body statistical scoring function is developed based on the Delaunay tessellation of proteins Discriminates native from decoy structures in most of the cases Distinguishes pre- and post-transition state structures and the native structure from MD folding simulation trajectories Highly effective in the accurate ranking of Livebench 6 and CASP5 predictions

More Related