280 likes | 306 Views
Protein-Protein Interaction Hotspots Carved into Sequences. Yanay Ofran 1,2 , Burkhard Rost 1,2,3 1.Department of Biochemistry and Molecular Biophysics, Columbia University 2. Columbia University Center for Computational Biology and Bioinformatics(C2B2)
E N D
Protein-Protein Interaction Hotspots Carved into Sequences Yanay Ofran1,2, Burkhard Rost1,2,3 1.Department of Biochemistry and Molecular Biophysics, Columbia University 2. Columbia University Center for Computational Biology and Bioinformatics(C2B2) 3. NorthEast Structural Genomics Consortium (NESG), Columbia University Presented by Navreet Virk CISC-841
Protein-Protein Interactions (PPI) • Refer to associations of protein molecules • Important for almost all biological processes For e.g. signals from exterior of a cell are mediated to the inside of that cell by PPI of signaling molecules • To understand biological mechanisms requires knowledge of PPI and underlying molecular principles
Protein-Protein Interface Residues • Studies have discovered networks of PPI in cells and identified pairs of interacting proteins • To understand PPI, the residues that account for binding of proteins and stabilizing complexes need to be identified i.e. the interface residues • Prediction methods have mainly focused on entire interfaces of 2 interacting proteins
Hotspots • Only few of the interface residues are essential for recognition and binding to other proteins • This small subset of essential residues is referred to as “hotspots” • If mutated, these residues hamper protein-protein interaction
Significance of Hotspots • For large interfaces, less than 5% of interface residues mostly contribute to total energy in binding • For small interfaces, it could be 1 amino acid for each protein • Identification of these residues may benefit function prediction • Hotspots predictions assist in identifying binding sites for drugs or desired drug targets
Protein-protein Interfaces, Hotspots and Predictions • Human growth hormone (yellow) bound to extra cellular position of its receptor • The chains of receptor are 201 residues long. PPI interface covers 31 of these residues (blue and red) on each chain. Mutating one of 6 red residues hampers the interaction. • Prediction method ISIS captures 5 on interface residues (green colored)
Identifying Hotspots • Given 3-D structure of complex, the residues crucial for binding can be predicted • Less than 1% of 3-D structures are known for interacting pairs • More efficient method- Alanine scans Experimentally mutate residues to alanine and measure effect of substitution on interaction • Method theoretically debatable
Current methods to predict hotspots • Currently no method identifies hotspots • Methods that identify all interface residues have good accuracy and low coverage Accuracy = TP/( TP + FN) Coverage = TP/( TP + FP) • Authors gave a hypothesis for low coverage of prediction methods
Hypothesis • The residues that are missed by prediction methods are more similar to the general population of surface residues than to the essential residues • Machine learning algorithm trained on all protein-protein interfaces discards non-hotspot as noise • Only hotspots are identified as signal to be learned
Test of hypothesis using ISIS • Authors applied ISIS method to task of identifying hotspots ISIS: Interaction Sites Identified from Sequence • Knowledge based method developed to identify interface residues from sequence • Based on system of neural networks • Takes as input: - sequence environment of residues - evolutionary profile of residues in the window - predicted solvent accessibility - predicted secondary structure - predicted conservation score
Overlap between two subsets • 90% accuracy and 5% coverage • Results of ISIS confirmed the hypothesis • There was a surprising overlap between: • the subset of residues confirmed to be hotspots by experimental alanine mutation - the subset of residues predicted by ISIS to be protein-protein interface residues
ISIS prediction method to identify hotspots • Dataset - ASEdb database of experimental alanine scans • Used 296 point mutations from 30 proteins • Used 2 representative examples initially • Compared experimental results of alanine scans and residues predicted by ISIS • No 3-D structure of proteins were used
Example1: HIV gp120/ CD4 Receptor Complex • Ashkenazi et al sequentially mutated many residues in V1 domain of CD4 receptor • 25 positions within 94 residues were known to substantially effect affinity of CD4 • ISIS predicted 30 residues • 19 of them were also experimentally predicted • Of the 6 missed, 4 were next to predicted residues • 5 of the residues were not experimentally mutated
Example 2: Voltage-Gated Potassium Channel • A membrane protein which helps control voltage gradient across plasma membrane • Within region of 29 consecutive residues, 8 had significant effect on affinity of channel to its inhibitors agitoxin2 and charybdotoxin • ISIS predicted 13 residues including 7 of the 8 experimentally found hotspots • Residue missed was buried in structure and didn’t have a direct effect on binding • 5 of 6 residues that did not coincide had significant but less dramatic effect than others
Performance over Entire Dataset • Almost all binding residues predicted by ISIS were experimentally found to be significant • 90% of negative predictions were experimentally confirmed • Using different points in curve, accuracy (true positives/ all positives) can be increased at the cost of coverage (true positive/ predicted positive)
Hotspots are easy to identify but hard to define • Neural networks were able to disregard most of residues observed in interface i.e. pupil (neural network) ignored the teacher (labeled data) • NNs identified a set of complex nonlinear correlations between input features and hotspot residues • Not possible to translate subtle and complex dependencies into simple explanations • Authors inferred which features are more or less relevant
Training Neural Networks on Different Parameters • NNs trained on sequence environment alone performed slightly better than random • Adding evolutionary information significantly improved performance on both interface residues and hotspots • Conservation of hotspots is marginally different from other residues • Although difference is marginal but performance improved significantly • NNs learned to distinguish between conservation that is indicative of hotspots and conservation that is not
Sequence Conservation • X-axis is conservation score of residues • Y-axis gives the fraction of residues within a given level of conservation • Compares the distribution of residue conservation between entire dataset, hotspots and residues with no effect • Differences are marginal but overall effect of conservation is substantial
Results of different methods to predict hotspots • Hydrophobic moment method represents the approach that relies on local physicochemical features • Knowledge based method using amino acid info for sequence only approach • Evolutionary Trace method for evolutionary conservation • ISIS for combination of all of them • ProMate for representing method using known 3-D structure
What does it take to predict hotspots? • ISIS and ProMate were most successful • Linear combinations of features is not enough to predict hotspots • Results show that complex combination of: - physicochemical - evolutionary - structural features is required to predict hotspots
How Hotspots Differ from Other Residues ? • ISIS was trained on large dataset to create large datasets of predicted hotspots and predicted non-hotspots • Characteristics of hotspots (positive values) and non-spots (negative values) were compared • Multiple sequence alignment is done for each protein • For each interface residue, the average occupancy of its position by each type of amino acid is calculated
Position Occupancy in Hotspots versus Rest of the Interface • the p-value is the probability of obtaining a result at least as extreme as a given data point, under the null hypothesis • for some amino acids there are significant differences between of hotspot and non-hotspot interface residues, while for others there are no substantial differences • Experimental values are close to values obtained for predicted hotspots
Secondary Structure Comparison of Hotspots and Non-hotspots • The secondary structure state of 39% of non-hotspot interface residues is loop. • In predicted hotspots, 57% of residues are in a loop state. • In both categories, rest of the residues were divided roughly equally between helices and strands • There is again a similarity between experimentally determined and predicted values Table comparing structural features
Conclusion • Alanine scans indeed capture some genuine physicochemical commonalities of interaction hotspots (as shown by Baker Lab) that could be identified by a general method • While alanine scans require high-resolution structure of protein complex, ISIS needs only sequence of a single chain regardless of its binding partner • Alanine scans produces numerical predictions in terms of binding energy (∆∆G), ISIS produces a binary prediction (hotspot/non-hotspot) • For similar levels of positive accuracy, coverage of ISIS is roughly half that of in silico alanine scanning • Thus when 3-D structure of complex is available, the in silico alanine scan is a powerful tool for identifying hotspots whereas when only sequence is available, ISIS can provide accurate predictions • Analyzing a single protein using ISIS typically requires a few minutes, therefore, allowing large-scale analysis of hotspots at a relatively small CPU cost. • Server for prediction available at http://www.rostlab.org/services/isis