550 likes | 702 Views
Protein Folding Prediction. Lauren M. Yarholar Rufei Lu Warren Yates Armando Diaz Miguel J Bagajewicz , Ph.D. School of Chemical, Biological, and Materials Engineering, College of Engineering, University of Oklahoma. Overview. Introduction Background Problem Energy Forms Methods
E N D
Protein Folding Prediction Lauren M. Yarholar Rufei Lu Warren Yates Armando Diaz Miguel J Bagajewicz, Ph.D. School of Chemical, Biological, and Materials Engineering, College of Engineering, University of Oklahoma
Overview • Introduction • Background • Problem • Energy Forms • Methods • Genetic Algorithm • Results and Discussion • Conclusion • VBA (Visual Basic Add-in) Program Demonstration
Background • A protein is a string of amino acids connected by peptide bonds. • Amino acid • Acidic • Basic • Aliphatic • Polar uncharged • Aromatic N-Terminus C-Terminus
Appropriate protein folding is critical for function and health • Proteins catalyze over 1,000 biochemical reactions in the human body.
Appropriate protein folding is critical for function and health • Protein misfoldings are responsible for over 20 diseases. • Mad Cow disease caused by an “evil” protein - The “evil” protein and normal protein have identical primary structures, but their tertiary structures are different. Normal PrP Diseased PrP
Difficulties of Predicting Protein Structures • Some proteins fold as fast as a millionth of a second • Theoretically, a protein of only 100 amino acids following the trial and error method would take 100 billion years to try out all possible conformations! • Protein structures are highly dependent upon various environmental parameters. • Such as temperature, pH, solvent, etc.
Protein Folding Prediction Methods • Comparative - Use evolutionary related protein • Advantages: fast and simple • Disadvantages: conformation depends upon environmental parameters • Folding Recognition - Utilize a database of known 3-D protein structure • Advantages: more accurate than comparative • Disadvantages: not enough NMR confirmed protein structures • Ab Initio - Uses both scientific and engineering approach • Advantages: has potential to predict exact shape and immediate structures • Disadvantages: computing limitations, difficulty in selecting correct potential energy function
Problems with the Current Prediction Methods • Not enough NMR confirmed protein structure in Protein Data Bank (PDB) • Evolutionary relatedness does not necessarily translate to similar structure • Ab initio difficulties • Hydrophilic and hydrophobic modeling gives only general arrangement of the protein • 2-D modeling does not predict 3-D shape of the protein • Monte-carlo computing method is time consuming and does not necessarily reach global minimum
Objectives • Develop a genetic algorithm based program to predict protein conformation • Reduce the generations needed for prediction, thus enhance the efficiency of the search • Explore different additional operators to modify genetic algorithm • Predict the protein conformation of a short 5-AA peptide, Enkephalin
Energy Calculation Potential Energy Model Energy Calculation Energy VBA program
Potential Energy Models • Electrostatic Energy • Nonbonding Energy • Hydrogen Bonding Energy • Cystein-Cystein Loop Energy
Electrostatic Energy • Energy term calculated in atom pairs • Modeled after coulomb force • Forces between two charges at certain distance (rij)
Electrostatic Energy r + + E, Joule Electrostatic term r, Angstrom
Nonbonding Energy • Two types of Lennard-Jones potential • 1-4 atom - connected by three bonds • 1-5 atom, higher interaction - connected by more than three bonds
Nonbonded Energy • Modeled after Lennard-Jones Potential Repulsion/Attractive forces F -F 1-4 Interactions 1 2 1-5 Interactions
Hydrogen Bonding • Energy associated with the hydrogen bonding in the protein.
Cysteine-Cysteine Loop Closing • Included if there are one or more intramolecular disulfide bonds
Atom Position Calculation Backbone Calculation Side-Chain Calculation Branch of Side-Chain Calculation
Introduction • The rotational angle between the bond between one pair of adjacent atoms and the next pair’s bond is called a dihedral angle • Phi is between N and C, psi is between C and C’, omega is between C’ and N
Backbone Calculation • First 3 atoms on the peptide chain are fixed • The coordinate system is arbitrarily determined around the first H atom of the N-terminus • Assumptions: • Minimal bond length stretch • Bond angle stays constant • Torsion angle (dihedral angle) applies to the 4th atom x q w Ca (-1.52,1.37,0) Y N (-1.04 ,0,0) Z H- (0,0,0)
Backbone Calculation The first 3 Bn parameters are fixed due to the previous assumption, B1, B2, and B3 corresponds to the H-, -N-, Ca
Side and Branch Group Calculation • Fisher projections to determine the dihedral angle of side-group atoms • Assumption: • Tetrahedral structure: 120o apart • Bent structure: 180o apart w1= dihedral angle w2= 120 + w1 w2= 180 + w1 w1
Genetic Algorithm GA Search and Optimization Fitness Function Genetic Operators α-helix and b-sheet implementation Binary GA
Genetic Algorithm • Search and optimization method that mimics the natural selection • Terms to define • Chromosome – a set of torsion angles • Gene – an individual torsion angle • Generation – a single loop within GA loop search • Loops through the reproduction, mutation, and adaptation process to obtain best fit model
Genetic Algorithm • Use a computer simulation to perform an intelligent search/optimization to find the native protein conformation that requires the least amount of energy Native Conformation
Genetic Algorithm based Protein Structure Search (GAPSS) • GAPSS is developed under Visual Basic Add-in environment • Modified genetic operators • Fitness function based selection • Multiple entries crossover • Non-uniform mutation • Adaptation • Advantages • Faster convergence • User-friendly
Fitness Function • Basic three primary energy: Eletrostatic, Nonbonded (6-12), and Hydrogen Bonded • Exclude Torsion Energy • Not real interaction energy • Introduce penalty for positive torsion • Cystine Loop-Closing introduced only when more than one cysteins are present in the protein
Genetic Operator - Selection • Selection Operator • Ranked Selection – higher the rank higher the probability of being chosen • Fitness Selection – better the fitness higher the probability of being chosen • Benefits of Selection • Aid the Elitism Search Higher rank or better fitness Lower rank or worse fitness
Genetic Operator - Mutation • Mutation Operator • Uniform Mutation – randomly replace with a value from -180 to 180 • Non-uniform mutation – add or subtract a random value between 0 and 180 • Effects of Mutation • Introduce variance to search • Aid the search for global minimum by directing gradient search out of the local minima
Genetic Operator - Crossover • Crossover Operator • Random 2-point Crossover – randomly exchange between parents 2 angles at a time • Multiple Entries Crossover – multiple random exchange • Benefits of Crossover • Aid the search for elites • Optimize the search by keeping the optimal folding segments
Genetic Operator - Adaptation • Adaptation Operator • Gradient search applied to each chromosome • Predict energy profile • Benefits of Adaptation • Provide the local minima search • Determine the energy profile of the native folding process
Three GA Approaches • Free GA search – no restriction on dihedral angles with exception of omega and ring structure • Advantages: use in any protein search, empirical way of obtaining protein conformation, and useful for energy profile search • α-helices and b-sheets specific GA search – randomly select segment of protein as α-helices and b-sheets • Advantages: enhance the speed of free GA and accurate search for α-helices and b-sheets • Binary GA search – use binary to represent dihedral angles instead decimal • Advantages: No barrier when doing crossover
α-helix and b-sheet Implementation • Creates α-helices and b-sheets of random lengths at random start positions • Each α-helix or b-sheet created in this way is described by two parameters • Crossover will involve trading the two parameters between two individuals
α-helix and b-sheet Implementation • When α-helices are crossed over, each individual’s new energy is compared to its old energy. If there is a net improvement, the crossover is kept. • The “former helix” regions will be filled with random torsion angles like normal Green region Blue region
Binary Code Implementation • Transfer torsion angles to binary code • Integer and decimal coded separately to shorten the total number of digits - 17 digits altogether • Idea is to make the torsion angles on a single chromosome represented by one long continuous chain • Cross over and Mutation operators all similar to GA 101001010100100001010011101011000010101101010010000101001010100100001010010101001010010100101010011100
Results and Discussion Individual AA Prediction Enkephalin Prediction Performance Analysis Discussion
Single AA Prediction • All single AA was predicted with GAPSS • GA parameters • Initial population: 20 • Generation limitation: 15 • Percentage of mutations: 90% • Compared to native single AA folding
Single AA Prediction Asparagine N Asn Alanine A Ala Asparatic Acid D Asp Cysteine C Cys Glutamine Q Gln Glutamic Acid E Glu Glycine G Gly Isoleucine I Ile
Single AA Prediction Leucine L Leu Serine S Ser Methionine M Met Valine V Val Threonine T Thr
Enkephalin Prediction • Enkephalin is pentapeptide that is involved in regulating pain • Two forms of enkephalin • Methylated-enkephalin – Tyr-Gly-Gly-Phe-Met • Leucine-enkephalin – Tyr-Gly-Gly-Phe-Leu • Short enough to confirm the accuracy of the GAPSS, however still contains complex ring side groups
Enkephalin Prediction • Gradient zero conformations suggests the GAPSS are capable of obtaining local minima • Backbone conformations showed incredible similarities • Side group conformations still show discrepancy between predicted and theoretical
Local Minimum Conformatons • GAPSS was able to locate a few local minimum protein conformations
Enkephalin Prediction - Backbone • Backbone structure was predicted by the GAPSS GA predicted Backbone Structure NMR Confirmed Backbone Structure
Enkephalin Prediction • Discrepancies between side groups due to the lack of entropy, solvation energy, and center partial charge assumption GA predicted Backbone Structure NMR Confirmed Backbone Structure