210 likes | 356 Views
Doug Raiford Lesson 19. Protein Conformation Prediction (Part III). Review: two folding models. Framework model Secondary structure first Assemble secondary structure segments Hydrophobic collapse Molten : compact but denatured Formation of secondary structure after: settles in
E N D
Doug Raiford Lesson 19 Protein Conformation Prediction (Part III)
Review: two folding models • Framework model • Secondary structure first • Assemble secondary structure segments • Hydrophobic collapse • Molten: compact but denatured • Formation of secondary structure after: settles in • van der Waals forces and hydrogen bonds require close proximity Protein Conformation Prediction (Part III)
Review: approaches • Two main approaches • Focus this lesson: De novo Protein Conformation Prediction (Part III)
Review • Did a quick look at threading (homology based) • Chou-Fasman (frequency of occurrence of aa’s at specific locations in structure) • Looked at HMM’s (HMMR and Protein Families—PFAM) • Looked at ROSETTA (De Novo, knowledge based) • Name P(a) P(b) P(turn) • Alanine 142 83 66 • Arginine 98 93 95 • Aspartic Acid 101 54 146 • Valine 106 170 50 Protein Conformation Prediction (Part III)
An ab initio example • Lattice Approach • Abstraction: take a problem of extreme complexity and simplify • Levinthal’s paradox (Physicist, Berkely, MIT, Columbia) • Protein with 100 amino acids => 3100 possible structures • Even if really fast (10-13 seconds to sample each structure) • 1.6*1027 years to go through all structures Protein Conformation Prediction (Part III)
Approach: Big Picture • Premise: proteins fold into lowest energy conformation • Reduce complexity by restricting amino acid locations to evenly spaced lattice points • Generate all possible conformations (within certain constraints) • Lowest energy models should be representative Protein Conformation Prediction (Part III)
Reduce complexity • Only occupy nodes of a lattice • Globular • limit number of nodes to 50 • Ellipsoidal bounding volume • No nodes without at least 2 connecting edges (no dead-ends) • Fewer nodes than aa’s in sequence (n/2) • Must align after the fact • From 0 to 3 residues between nodes Protein Conformation Prediction (Part III)
Reduce complexity (cont’d) • Limit to sequence length of 100 (n) • Energy function statistically derived (verses computationally expensive energy calculations) • Minimal edge lattice – diamond lattice • Between 105 and 107 enumerated conformations Protein Conformation Prediction (Part III)
How Exhausting is Exhaustive: Time • “We are able to do exhaustive searches of compact, bounded lattice structures with up to approximately 40 vertices. These searches take on the order of a few hours on a fast workstation, and can easily be executed in parallel over several machines.” Protein Conformation Prediction (Part III)
Complexity Reduction: Tetrahedral Lattice • At most 3 choices at each node • Self avoiding therefore much pruning • Constrained to small volume (ellipse) • Probably recursive enumeration with self avoidance • Filter • Symmetry check: remove conformations that differ only in their orientation • 26 already • Remember, total of 50 Protein Conformation Prediction (Part III)
Given All Possible Conformations… • How to align sequence • Remember there are more aa’s than nodes (from 0 to 3 residues between nodes) • How to score overall energy of a conformation • How to judge similarity to known protein (native) conformation Protein Conformation Prediction (Part III)
Aligning • Iterative/Dynamic • Start out evenly spaced • For each node determine the seven possible residues • Choose lowest energy not taken previously • Rinse and repeat • Converges in 3 to 5 iterations Protein Conformation Prediction (Part III)
Energy associated with m,n contact average of 5 adjacent energies m and n given double weight Rest given single weight Average of all energies (divide by 6) Scoring Energy m-1 m m+1 n-1 n n+1 Protein Conformation Prediction (Part III)
Scoring Energy • But from where did erm,rn come • Statistically derived Protein Conformation Prediction (Part III)
Given a database of proteins the energy of any given combination of two amino acids is given by: • If 1 then across all proteins there are about as many u,v’s as expected. • If >1 then more • If <1 then fewer How contacty is a given protein Across all proteins, number of v’s next to u’s Expected number of u,v contacts Scoring Energy Protein Conformation Prediction (Part III)
Discrete State Off-Lattice Models • Instead of limiting residues to regularly spaced lattice nodes in space… • Limit phi and psi angles to a reduced set of discrete angles Protein Conformation Prediction (Part III)
Scoring energy • Off lattice models often attempt to minimize total energy G : Free energy H : Enthalpy S : Entropy ΔE=q-w S=klnΩ ΔH=ΔE+Δ(PV) ΔG = ΔGvander Waals+ ΔGH-bonds+ ΔGsolvent+ ΔGCoulomb Protein Conformation Prediction (Part III)
Scoring accuracy • Backbone RMSD • Root mean square deviation • Usually choose top 100 or so predictions and show that actual resides in the set Top 100 conformations --------------- --------------- --!!Actual!!- --------------- --------------- --------------- Protein Conformation Prediction (Part III)
PDB files X Y Z Occu Temp Element ATOM 1 N THR A 5 23.200 72.500 13.648 1.00 51.07 N ATOM 2 CA THR A 5 23.930 72.550 12.350 1.00 51.27 C ATOM 3 C THR A 5 23.034 72.048 11.220 1.00 50.34 C ATOM 4 O THR A 5 22.819 72.747 10.228 1.00 51.19 O ATOM 5 CB THR A 5 25.221 71.703 12.416 1.00 51.94 C ATOM 6 OG1 THR A 5 26.159 72.326 13.305 1.00 53.51 O ATOM 7 CG2 THR A 5 25.849 71.583 11.046 1.00 53.33 C Protein Conformation Prediction (Part III)
Algorithm Name P(a) P(b) P(turn) f(i) f(i+1) f(i+2) f(i+3) Alanine 142 83 66 0.06 0.076 0.035 0.058 Arginine 98 93 95 0.070 0.106 0.099 0.085 Aspartic Acid 101 54 146 0.147 0.110 0.179 0.081 Asparagine 67 89 156 0.161 0.083 0.191 0.091 Cysteine 70 119 119 0.149 0.050 0.117 0.128 Glutamic Acid 151 037 74 0.056 0.060 0.077 0.064 Glutamine 111 110 98 0.074 0.098 0.037 0.098 Glycine 57 75 156 0.102 0.085 0.190 0.152 Histidine 100 87 95 0.140 0.047 0.093 0.054 Isoleucine 108 160 47 0.043 0.034 0.013 0.056 Leucine 121 130 59 0.061 0.025 0.036 0.070 Lysine 114 74 101 0.055 0.115 0.072 0.095 Methionine 145 105 60 0.068 0.082 0.014 0.055 Phenylalanine 113 138 60 0.059 0.041 0.065 0.065 Proline 57 55 152 0.102 0.301 0.034 0.068 Serine 77 75 143 0.120 0.139 0.125 0.106 Threonine 83 119 96 0.086 0.108 0.065 0.079 Tryptophan 108 137 96 0.077 0.013 0.064 0.167 Tyrosine 69 147 114 0.082 0.065 0.114 0.125 Valine 106 170 50 0.062 0.048 0.028 0.053 Protein Conformation Prediction (Part III)