Presented by: Michelle Cavallo I.B. PhD Student Advisor: Dr. R. Narayanan

Predicting Protein Folding PathwaysZaki, Nadimpally, Bardhan, and BystroffData Mining in Bioinformatics Presented by: Michelle Cavallo I.B. PhD Student Advisor: Dr. R. Narayanan

Overview • Problem: • Identify a time-ordered sequence of folding events that make up a structured protein folding pathway • Solution: • Novel “unfolding” approach for predicting the folding pathway • Apply graph-based methods on a weighted secondary structure graph of a protein to predict the sequence of unfolding events • Reverse the event sequence to see the folding pathway • Experiments: • Successful predictions for proteins with partially known folding pathways

Introduction • Proteins fold spontaneously and reproducibly in an aqueous solution • Structure is determined by sequence • Function is determined by structure Hemoglobin: a globular protein

Protein Problems • Two major protein problems for bioinformatics: • The Structure Prediction Problem • Determine 3D 3º structure from linear amino acid sequence • The Pathway Prediction Problem • Given an amino acid sequence and its 3D structure, determine the folding pathway that leads from the linear structure to the 3º structure • Major focus has been on structure prediction

Structure Prediction • Traditional approaches to structure prediction have focused on: • Evolutionary homology • Fold recognition (goodness of fit score for sequence-structure alignment) • Ab initio simulations (conformational search for the lowest energy state) • Conformational search space is huge • Proteins fold in milliseconds—a structured folding pathway must play an important role in this conformational search • Experimental evidence does indicate that certain events always occur early in the folding process and certain others always occur later

Towards Pathway-based Structure Prediction • To make pathway-based approaches to structure prediction a reality, plausible protein folding pathways need to be predicted. • The ability to predict folding pathways can greatly enhance structure prediction methods.

Studying Folding Pathways • One approach to studying folding pathways is to identify folding possibilites in an unfolded protein. • This is infeasible—there are too many possibilities. • The approach used in this study is to start with a folded protein in its final state and learn how to “unfold” the protein. • The reversed unfolding sequence could then be a plausible protein folding pathway. • The solution: Use minimum cuts on weighted graphs to determine a plausible sequence of unfolding steps.

Protein Contact Maps • A protein contact map represents the distance between every two residues of a 3D protein structure in a 2D matrix. • Represented in a symmetrical, square Boolean matrix of pairwise interresidue contacts • “A contact map for a protein with N residues is an N x N binary matrix C whose element C (i, j) = 1 if residues i and j are in contact and C (i, j) = 0 otherwise” • Protein contact maps can be created using different tools, e.g. BioPython, Structer

Protein Contact Maps Figure 7.2 shows the 3º structure and contact map for IgG-binding protein from PDB

Graphs and Minimum Cuts • A protein can be represented as a weighted secondary structure element graph (WSG) • Vertices = the SSEs that make up the protein • β-strands represented as triangles • α-helices represented as circles • Edges denote proximity relationships between SSEs • Edges weighted by strength of interactions between SSEs • Edge construction and weights are determined from the contact map

Graphs and Minimum Cuts

Solution/Approach Outline • Approach to predicting a folding pathway using the idea of “unfolding” • “Use a graph representation of a protein, where a vertex denotes a 2º structure and an edge denotes the interactions between the two SSEs” (2º structure elements).” • Unfold the protein through a series of mincuts

Unfolding via Mincuts • Unfold one piece at a time, each time choosing the cut which will have the least impact on the remaining structure • The sequence can then be reversed to identify plausible pathways for protein folding • This series of mincuts predicts the most likely sequence of unfolding events

Unfolding via Mincuts • A mincut represents the set of edges that partition a WSG into two components with the smallest number of bonds between them • Stoer-Wagner (SW) deterministic polynomial-time mincut algorithm was used since it is simple and fast. • “The SW algorithm works iteratively by merging the vertices until only one unmerged vertex remains”

Unfolding via Mincuts • “SW starts with an arbitrary vertex and adds the most highly connected vertex to the current set” • This process is repeated until all vertices have been added in order of decreasing attraction to the first

Unfolding via Mincuts “An unfolding event is a set of edges that form a mincut in the WSG for a protein.”

The UNFOLD Algorithm • Determine mincut for initial WSG • Break ties arbitrarily • Delete edges forming this cut from WSG • This yields two new connected subgraphs • Recursively process each subgraph to yield a sequence of mincuts corresponding to the unfolding events • Reverse this sequence to obtain predicted folding pathway

The UNFOLD Algorithm • Sequence of mincuts that can be visualized as a tree • Nodes represent sets of vertices (graphs) produced by mincuts • Children of a node represent partitions resulting from the mincut

Consideration • “Allowance should be made for several folding events to take place simultaneously. • However, there may be intermediate stages that must happen before higher order folding can take place.” • “The results should not be taken to imply a strict folding timeline, but rather as a way to understand major events that are mandatory in the folding pathway.”

Experimentation • No one has determined a complete protein folding pathway • However, there is evidence supporting intermediate pathway stages for several well-studied proteins • Proteins with known intermediate pathway stages were analyzed with UNFOLD

Detailed Test Case: 4DFR • Dihydrofolate Reductase (PDB ID: 4DFR) • Involved in nucleotide metabolism • Has an adenine binding domain which is formed (folded) early on in the folding pathway • An α1 and β2 interaction • 4DFR has four α-helices and eight β-strands.

4DFR Detailed Test Case Continued • Shown below are the WSG, unfolding sequence, and a series of intermediate stages in the folding pathway • “According to the mincut-based UNFOLD algorithm, the vertex set {β2α2β3β1} lies on the folding pathway in agreement with the experimental results.”

4DFR Detailed Test Case Continued Predicted folding sequence for 4DFR

Pathways for Other Proteins • Several other proteins with known protein folding pathway intermediate stages were UNFOLDed • Bovine Pancreas Trypsin Inhibitor, Chymotrypsin Inhibitor 2, Human Procarboxypeptidase A2, Cell Cycle Protein p13suc1, β-lactoglobulin, Interleukin-1β, Protein Acylphosphatase, Twitchin Ig Superfamily Domain Protein, Myoglobin and leghemoglobin • UNFOLD results reflected experimental results

Conclusion • A repeat mincut approach (UNFOLD algorithm) can be used for automated prediction of protein folding pathways

Future Perspectives • Plan to test UNFOLD on the entire collection of proteins in the PDB • Want to study proteins from the same family to look for prediction of consistent pathways • Similarities and dissimilarities are both of interest

Limitations • “UNFOLD arbitrarily picks only one micut out of perhaps several mincuts that have the same capacity” • Constructing all possible pathways might provide stronger evidence of intermediate states • “All native interactions are considered energetically equivalent, and thus larger stabilizing interactions are not differentiated.” • Simplified model based on topology • Folding mechanism inferred from native structure alone • May be ok, because investigations indicate folding mechanisms are largely determined by topology

Biology Perspectives • “The ability to predict folding pathways can greatly enhance structure prediction methods” • We want to predict structures to assign putative functions to novel genes!

Biology Perspectives Continued • It is very difficult to determine a protein structure in the lab • X-ray crystallography • Technique is difficult to perform • Results are difficult to interpret • We would like to have fast, easy methods for predicting structure in silico.

Biology Perspectives Continued • Protein folding pathway prediction is of particular interest in prion research • Prions = misfolded proteins which cause transmissible spongiform encephalopathy • Creutzfeldt-Jakob Disease • Gerstmann-Sträussler-Scheinker Syndrome (GSS) • Fatal Familial Insomnia (FFI) • Kuru

Presented by: Michelle Cavallo I.B. PhD Student Advisor: Dr. R. Narayanan

Presented by: Michelle Cavallo I.B. PhD Student Advisor: Dr. R. Narayanan

Presentation Transcript

Junior Class Visits Fall 2009

Estimating Age

Ready, Fire, Aim! Presented by: Lea Pennock Director, Student Information Systems Sharon Scott

Differentiation:

Welcome to Florence-Darlington Technical College

Pricing and Costing

School Calendar Workshop School Year 2014-15 Presented by West Virginia Department of Education Division of Student Supp

Renaissance Student Revision Lecture

The Blazers Student #1, Student #2, and Student #3 LB144 F06

Presented by Michele J. Hansen, Ph.D., Director of UC Assessment November 20, 2008

Student Living Survey 2004/5 Wave 5

“ GREEN ZONE ” TRAINING PROGRAM SUPPORTING STUDENT VETERANS Presented by

Federal Student Loan Exit Counseling

Pricing and Costing

Multicast Routing in the Internet

Join The Leaders

Michelle Obama

Capturing User Intent for Information Retrieval