310 likes | 469 Views
Predicting Protein Folding Pathways Zaki, Nadimpally, Bardhan, and Bystroff Data Mining in Bioinformatics. Presented by: Michelle Cavallo I.B. PhD Student Advisor: Dr. R. Narayanan. Overview. Problem:
E N D
Predicting Protein Folding PathwaysZaki, Nadimpally, Bardhan, and BystroffData Mining in Bioinformatics Presented by: Michelle Cavallo I.B. PhD Student Advisor: Dr. R. Narayanan
Overview • Problem: • Identify a time-ordered sequence of folding events that make up a structured protein folding pathway • Solution: • Novel “unfolding” approach for predicting the folding pathway • Apply graph-based methods on a weighted secondary structure graph of a protein to predict the sequence of unfolding events • Reverse the event sequence to see the folding pathway • Experiments: • Successful predictions for proteins with partially known folding pathways
Introduction • Proteins fold spontaneously and reproducibly in an aqueous solution • Structure is determined by sequence • Function is determined by structure Hemoglobin: a globular protein
Protein Problems • Two major protein problems for bioinformatics: • The Structure Prediction Problem • Determine 3D 3º structure from linear amino acid sequence • The Pathway Prediction Problem • Given an amino acid sequence and its 3D structure, determine the folding pathway that leads from the linear structure to the 3º structure • Major focus has been on structure prediction
Structure Prediction • Traditional approaches to structure prediction have focused on: • Evolutionary homology • Fold recognition (goodness of fit score for sequence-structure alignment) • Ab initio simulations (conformational search for the lowest energy state) • Conformational search space is huge • Proteins fold in milliseconds—a structured folding pathway must play an important role in this conformational search • Experimental evidence does indicate that certain events always occur early in the folding process and certain others always occur later
Towards Pathway-based Structure Prediction • To make pathway-based approaches to structure prediction a reality, plausible protein folding pathways need to be predicted. • The ability to predict folding pathways can greatly enhance structure prediction methods.
Studying Folding Pathways • One approach to studying folding pathways is to identify folding possibilites in an unfolded protein. • This is infeasible—there are too many possibilities. • The approach used in this study is to start with a folded protein in its final state and learn how to “unfold” the protein. • The reversed unfolding sequence could then be a plausible protein folding pathway. • The solution: Use minimum cuts on weighted graphs to determine a plausible sequence of unfolding steps.
Protein Contact Maps • A protein contact map represents the distance between every two residues of a 3D protein structure in a 2D matrix. • Represented in a symmetrical, square Boolean matrix of pairwise interresidue contacts • “A contact map for a protein with N residues is an N x N binary matrix C whose element C (i, j) = 1 if residues i and j are in contact and C (i, j) = 0 otherwise” • Protein contact maps can be created using different tools, e.g. BioPython, Structer
Protein Contact Maps Figure 7.2 shows the 3º structure and contact map for IgG-binding protein from PDB
Graphs and Minimum Cuts • A protein can be represented as a weighted secondary structure element graph (WSG) • Vertices = the SSEs that make up the protein • β-strands represented as triangles • α-helices represented as circles • Edges denote proximity relationships between SSEs • Edges weighted by strength of interactions between SSEs • Edge construction and weights are determined from the contact map
Solution/Approach Outline • Approach to predicting a folding pathway using the idea of “unfolding” • “Use a graph representation of a protein, where a vertex denotes a 2º structure and an edge denotes the interactions between the two SSEs” (2º structure elements).” • Unfold the protein through a series of mincuts
Unfolding via Mincuts • Unfold one piece at a time, each time choosing the cut which will have the least impact on the remaining structure • The sequence can then be reversed to identify plausible pathways for protein folding • This series of mincuts predicts the most likely sequence of unfolding events
Unfolding via Mincuts • A mincut represents the set of edges that partition a WSG into two components with the smallest number of bonds between them • Stoer-Wagner (SW) deterministic polynomial-time mincut algorithm was used since it is simple and fast. • “The SW algorithm works iteratively by merging the vertices until only one unmerged vertex remains”
Unfolding via Mincuts • “SW starts with an arbitrary vertex and adds the most highly connected vertex to the current set” • This process is repeated until all vertices have been added in order of decreasing attraction to the first
Unfolding via Mincuts “An unfolding event is a set of edges that form a mincut in the WSG for a protein.”
The UNFOLD Algorithm • Determine mincut for initial WSG • Break ties arbitrarily • Delete edges forming this cut from WSG • This yields two new connected subgraphs • Recursively process each subgraph to yield a sequence of mincuts corresponding to the unfolding events • Reverse this sequence to obtain predicted folding pathway
The UNFOLD Algorithm • Sequence of mincuts that can be visualized as a tree • Nodes represent sets of vertices (graphs) produced by mincuts • Children of a node represent partitions resulting from the mincut
Consideration • “Allowance should be made for several folding events to take place simultaneously. • However, there may be intermediate stages that must happen before higher order folding can take place.” • “The results should not be taken to imply a strict folding timeline, but rather as a way to understand major events that are mandatory in the folding pathway.”
Experimentation • No one has determined a complete protein folding pathway • However, there is evidence supporting intermediate pathway stages for several well-studied proteins • Proteins with known intermediate pathway stages were analyzed with UNFOLD
Detailed Test Case: 4DFR • Dihydrofolate Reductase (PDB ID: 4DFR) • Involved in nucleotide metabolism • Has an adenine binding domain which is formed (folded) early on in the folding pathway • An α1 and β2 interaction • 4DFR has four α-helices and eight β-strands.
4DFR Detailed Test Case Continued • Shown below are the WSG, unfolding sequence, and a series of intermediate stages in the folding pathway • “According to the mincut-based UNFOLD algorithm, the vertex set {β2α2β3β1} lies on the folding pathway in agreement with the experimental results.”
4DFR Detailed Test Case Continued Predicted folding sequence for 4DFR
Pathways for Other Proteins • Several other proteins with known protein folding pathway intermediate stages were UNFOLDed • Bovine Pancreas Trypsin Inhibitor, Chymotrypsin Inhibitor 2, Human Procarboxypeptidase A2, Cell Cycle Protein p13suc1, β-lactoglobulin, Interleukin-1β, Protein Acylphosphatase, Twitchin Ig Superfamily Domain Protein, Myoglobin and leghemoglobin • UNFOLD results reflected experimental results
Conclusion • A repeat mincut approach (UNFOLD algorithm) can be used for automated prediction of protein folding pathways
Future Perspectives • Plan to test UNFOLD on the entire collection of proteins in the PDB • Want to study proteins from the same family to look for prediction of consistent pathways • Similarities and dissimilarities are both of interest
Limitations • “UNFOLD arbitrarily picks only one micut out of perhaps several mincuts that have the same capacity” • Constructing all possible pathways might provide stronger evidence of intermediate states • “All native interactions are considered energetically equivalent, and thus larger stabilizing interactions are not differentiated.” • Simplified model based on topology • Folding mechanism inferred from native structure alone • May be ok, because investigations indicate folding mechanisms are largely determined by topology
Biology Perspectives • “The ability to predict folding pathways can greatly enhance structure prediction methods” • We want to predict structures to assign putative functions to novel genes!
Biology Perspectives Continued • It is very difficult to determine a protein structure in the lab • X-ray crystallography • Technique is difficult to perform • Results are difficult to interpret • We would like to have fast, easy methods for predicting structure in silico.
Biology Perspectives Continued • Protein folding pathway prediction is of particular interest in prion research • Prions = misfolded proteins which cause transmissible spongiform encephalopathy • Creutzfeldt-Jakob Disease • Gerstmann-Sträussler-Scheinker Syndrome (GSS) • Fatal Familial Insomnia (FFI) • Kuru