Protein Structure Prediction

Protein Structure Prediction

What is PSP ?

Primary sequence (1D) Tertiary Structure (3D) …ACLLYYTTCAT… all bonds angles, dihedral angles and bond lengths between each amino acid residue in protein

“Solving” PSP

View PSP as a search Given any primary sequence of an unknown protein (in the sense of it’s 3D structure) Consider PSP as performing a search through the configuration space of the given protein The space of different configurations

Steps in solving PSP Given primary sequence predict the final 3D structure (1D 3D). 2 Step process (1D 2D, 2D 3D) 1st find configuration for the secondary structure (SS Prediction) 2nd find configuration for the side-chains (side-chain conformation)

Required “components” in solving PSP All methods require the definition of a protein model A simplified protein structure model A potential energy function

Simplified structure model of Protein By the above we mean that the protein in question has simpler physical properties then an actual protein This is needed as trying to solve PSP is too complex for real proteins Good simplified models give a good approximation for the actual shape of the protein Determining a good model is a research area by itself

Simplified model of the protein backbone “Actual” model of the protein backbone Example of Simplified Model of Protein 3 dihedral angles Bond angles is the only dihedral angle

Potential Energy Function In thermodynamics, A molecule is most stable when it’s free energy is at a minimum How do we know when a predicted structure is the native shape of the protein ? native shape is at a free energy minimum • The potential energy function is a simplification of actual forces acting on a real protein molecule and it’s formulation is based on the given simplified structural model

Purpose : Minimize Example of Potential Energy Function Hydrophobic Interaction Van der Waals Interaction Etotal= EHH+ Evdw Evdw = Cv · fvdw Van der Waals Potential Summation over all atoms with rij < 8A A = Angstrom = 1 ten-billionth of meter rij = distance between atom i and j Ri = van der waals radii of atom i

Different approaches to PSP Ab Initio Methods Knowledge Based Methods

Ab Initio Methods

Ab Initio means from 1st principles • Use thermodynamic laws to figure out the configuration of the fold of the given protein protein folding problem • Global/semi-global minimization of the function • 1D 2D = secondary structure problem • 2D 3D = side-chain conformation What is Ab Initio ?

Some Ab Initio Methods Molecular Dynamic Simulation Using complex energy functions simulate folding of the primary sequence until it reaches it’s native state (1D->3D) Genetic Algorithm Used in refining a given potential function so that it can best predict the native state of a protein Simulated Annealing Branch and Bound Methods (usually used in side-chain conformation) Approximation algorithms Comparative/Homologue Modeling Threading Docking

Knowledge Based Methods

Knowledge Based Methods Using knowledge of currently known protein folds, predict the shape of the target protein Assumption is the native fold of the target protein is similar to a currently known one i.e in the same family Unable to predict any novel folds, i.e new fold family

Some Knowledge-Based Methods Comparative/Homologue Modeling Threading Docking

Primary Protein Sequence Methodological Framework for solving PSP Knowledge-base, e.g PDB Ab Initio Methods Homologue Modeling Threading Predicted 3D Structure of Protein

Side-Chain Prediction Find a conformation of the all the side chains along the given main chain of a protein Usually done as the 2nd step in predicting the 3D structure of protein Also useful in drug design, where drug structures have to be designed to be easily docked by enzymes for breaking down

Side-Chain Prediction The main chain fold has been computed and given as input choose positions of all side chains so as to minimize some potential energy function Problem if solved Ab Initio is proven to be NP-Complete (reduce Clique to it)

Central Dogma The more tightly packed Side-chains are, the more stable they will be. Ponder & Richards have shown that there are a fixed set of rotations (rotamers) side-chains can take. Most methods now make use of this library of rotamers (abt 67 different rotations) Main concern is the search strategy to find the best conformation

Methods in Side-chain prediction Simulated Annealing A* algorithm Monte Carlo Minimization Molecular Dynamics Simulation Dead End Elimination Genetic Algorithm

Dead End Elimination Deterministic method to determine the global minimum energy conformation (GMEC) of set of side-chains. Continuously eliminate rotamers from consideration in the GMEC, until only 1 rotamer is left in each side-chain position (thus giving final conformation). DEE can be viewed as a mathematical criteria that a rotamer must fulfill in order not to be eliminated

Dead End Elimination Potential function is described in terms of pair-wise interactions of all rotamers at all positions. Therefore energy function to minimized can be formulated as Sum of pairwise interaction energy between rotamer r at position j and rotamer u at positions j energy of the given backbone fold Sum of energy of rotamer r at side chain i

Dead End Elimination Assuming p side-chains and n rotamers for each side-chain Time complexity of finding the configuration that minimizes the energy function takes O(p*np) Not feasible to use the original formulation

Original DEE A rotamer ir can be eliminated from consideration if there is an alternative rotamer it at the same position that satisifies Maximum pairwise interaction energy between itand every other side-chain j Energy resulting from using rotamer r at position i Minimum pairwise interaction energy between irand every other side-chain j Energy resulting from using rotamer t at position i

Original DEE Rotamer r at side-chain i can only be eliminated only if maximum energy conformation using rotamer t at side-chain i is smaller than the minimum energy conformation of using rotamer r Given some relevant energy landscape, the previous inequality in fact says the following

Original DEE A simplistic implementation of Original DEE by simply translating the inequality to code result in a time complexity of O(n2*p2) There is however still a problem if the following happens

Simple Goldstein DEE A rotamer ir can be eliminated from consideration if there is an alternative rotamer it at the same position that satisifies Energy resulting from using rotamer r at position i Energy resulting from using rotamer t at position i Difference in energy of conformation using ir and conformation using itwhich are at the point of closest contact

Simple Goldstein DEE Rotamer r at side-chain i can only be eliminated by both totamer t1 and rotamer t2 since the difference is +ve at the points of closest contact. Meaning for any given conformation using t1 or t2 will result in a smaller overall energy than using r Given some relevant energy landscape, the previous inequality in fact says the following

Simple Goldstein DEE A simplistic implementation of Simple Goldstein DEE by simply translating the inequality to code result in a time complexity of O(n3*p2) There is however still a problem if energy profiles of rotamer r and every other rotamer intersect. More powerful criteria will have to be used General GoldStein DEE, Simple Split DEE and general Split DEE The more powerful the criterion the higher it’s time complexity

Conclusion Myriad of methods to attempt to solve the protein prediction problem Knowledge-based methods have gained a edge over Ab initio methods However not much improvement in the prediction power of modern heuristics, since the 1st experiment by Anfisen 3 decades ago Either problem is too hard / More discovery awaits the adventurous researcher

Protein Structure Prediction