380 likes | 595 Views
Supervisor: Professor Michel DROZ Un iversity of Geneva Dep artment o f Theoretical Physics. PhD Lecture P é ter HANTZ. The Biological Physics of P rotein Folding: the Random Energy Model and Beyond. Geneva, 4 May 2006. Outline of the lecture: 1. Modeling disordered systems
E N D
Supervisor:Professor Michel DROZUniversity of GenevaDepartmentof TheoreticalPhysics PhD Lecture Péter HANTZ The Biological Physics of Protein Folding: the Random Energy Model and Beyond Geneva, 4 May 2006
Outline of the lecture: 1. Modeling disordered systems • Spin glasses, frustration, Random Energy Model 2. Proteins: Building Elements and Structure • Primary, Secondary and Tertiary Structure, Classification 3. The Problem of Protein Folding • Anfinsen Experiment, Levinthal Paradox • A Microscopic model: RHP • Phenomenological models: Gō, REM • Sequence Design, Minimal Frustration • Kinetics: Funnel Hypothesis, Nucleation, Reaction Coordinate
Spin Glasses What is a spin glass? • interacting system of spins • low-temperature: frozen in random orientations What is necessary for this? • (at least partially) random interactions • competing interactions Simple Model Hamiltonians: Sherrington-Kirkpatrick Model P-spin model Distribution of coupling constants:
Frustration… • no configuration is uniqely favoured by all of the interactions • “fully frustrated” systems: hypercube/hypertetrahedron where the Jij=±1, and
And its consequences… • rugged energy landscape “barrier tree” of a p-spin model, P=3, N=7 (Fontanari, 2001) (F=E-TS >> calculating the entropy: restrict to valleys)
And its consequences… • high degree of ground-state degeneracy (Plischke, 1994) three very different configurations have the same ground state energy in several models:
And its consequences… • Great relevance of broken ergodicity(Palmer, 1983) -pure systems: mean-field theory of ferromagnets time average≠Gibbs average <Si>t=±m <Si>G=0 -spin glasses: in the limit of large N, the state space becomes partitioned into mutually inaccessible “valleys” (Fischer, 1993)
Averages in disordered systems • quenched average (of the free energy) -“over the realizations of the disorder” -the randomness of a system, Jij, is fixed (time-scale problem) Note: doing the average of the logarithm is difficult. • annealed average (of the free energy) -both spins and the randomness Jij are thermodynamic variables Essential Difference(case of a protein sequence): • q:SUM of the free energies of various sequences • ad:SUM of EXPs of sequences
Averages in disordered systems • self-averaging quantities -extensive quantities: macroscopic system and subsystems • Z is notself-averaging (eg. one sample with low free energy could dominate the sum)
The Random Energy Model (REM) • the E total energy of a system = sum of independent contributions • central limit theorem => A particular set {E({J})1, E({J})2, … E({J})Ω} represents the energy levels of one particular realization {J}, of the modeled system • the E({J})ienergies of different microstates of a realization are statistically independent • number of microstates (eg. in the case of N Ising spins)
Properties of the REM • average density of states (average over the realizations of the disorder) spectra of two realizations (eg. {J}, polymer chains) (Pande et al., 1997)(1) (2) • below an average threshold energy EC : • since the density n(E) is self-averaging only in the middle region of P(E).
Properties of the REM • entropy The entropy cannot be negative. If E< EC, S(E)=0, the system is “frozen”. • critical temperature and for the critical temperature (where S=0, but s=S/N not necessarily 0) we have
Properties of the REM • free energy If T>TC, However, if T<TC, S=0, and • partition function In case if n(E) is self-averaging, Z does not depend on the disorder, and
Digression: Order parameters • distinguishing between HT paramagnetic and LT frozen states (Edwards and Anderson, 1975) • some other important quantities Stat. mech. order parameter: Degree of broken ergodicity: • “similarity” between states (e.g. phases) of the system +1: full
Digression: Phase diagram of the SK model (T, J0, H) • Replica Trick to perform the quenched averaging of F By simplifying this expression, introducing as new variables qrs, and performing a saddle-point analysis, we arrive: Spin glass phase: q≠0, <Si>=0; (Binder, 1986) (Sherrington and Kirckpatrick, 1978; H-T plane: Almeida and Thouless, 1978)
Protein Synthesis Transcription: DNA A, G, C, T pre-mRNA splicing mRNA A, G, C, U Translation: ribosomes, tRNA Genetic code (degenerated !) Initiation: usually Met (AUG) Stop: UAA, UAG, UGA Folding: with or without chaperons Covalent modifications: disulfide bonds proteolytic modifications, glycozylation… A chaperone
Protein Structure Primary structure: the amino acid sequence Ramachanrdan plot L C N
Protein Structure Secondary structure: common regular local structures α-helix β-sheet RH helixes are more common than LH
Protein Structure and Classification • Tertiary structure: overall three-dimensional structure of a protein molecule • motifs=common “blocks”, domains=independently folding regions • Classification: • Globular proteins Fibrous proteins • Lysozyme Heat Shock Protein Collagene • Natively Unfolded proteins • -substantial regions of disordered structure • -usually have a target ligand • -disorder-order transition when binding
Protein Structure Quaternary Structure arrangements of several polymer molecules in a structure
Protein Folding Interactions stabilizing the proteins • hydrophobic effect -entropic origin • hydrogen bonds - polar molecules • van der Waals interactions - induced dipoles • Coulomb interactions • at some proteins, disulfide bonds kT = 4 x10-21 J = 0.03 eV Anfinsen’s experiment Denaturation - Ribonuclease enzyme restoring the original conditions – the enzyme STARTED TO WORK AGAIN • gentle heating / chemical treatment (urea, mercapto-ethanol) denaturation • restoring the original conditions spontaneous refolding (time scale: seconds) => Building of the 3D structure is SPONTANEOUS (in many cases)
Levithal’s paradox • Anfinsen: there is a native state (F=minimum) • small protein, N=100 amino acids • assume 3 rotamers/monomer Total number of structures: • one microstate visited in 10-13s Time necessary for finding the native state: Thermodynamic + Kinetic problem Solution: Biasing towards the native state is necessary
Microscopic Models A typically used Hamiltonian aImonomer species 1...20 (I: index along the chain) N number of monomers rIposition of the monomer I Δ interaction range function (usual lattice models: 1 for nn., 0 otherwise) ε(aI, aJ)interactions between amino acids Iand J (NxN) εij amino acid interaction matrix (20x20) Including hydrophobicity: -the 21th species is the water -in the “empty” sites
Digression: the Gō model • assumption: we know the folded, native conformation • this conformation is energetically very well optimized • energy: function of the native contacts εIJ= -wif I and J are first neighbors in the native state εIJ=0otherwise η: the number of native contacts “uses the answer to answer the question” ? This model does not help the structure prediction, but it is helpful if we are studying how the protein reaches its native state.
Energy spectrum of random heteropolymers The energy spectrum (400 lowest states) looks alike REM (Sali et al., 1994) Indeed, O(N)≈independent terms => Central Limit Theorem => Gaussian distribution only some sequences would fold repeatedly to the same state KEY: single low-lying ground energy
Essential: the ground state Threshold energy of the REM: Extreme value statistics: Gumbell distribution it can be shown: width of the energy gap: Problems with the REM (thermodynamics): •no flexibility against changing conditions •no mutation stability matrix elements changed with ±b, energy levels change with (not large enoughΔE for a unique native state /freezing, escape/) •there must be some correlation between the energy levels…
A Way Out: Sequence Design “Pulling down” the energy of a target conformation Canonical design •Given a 3d conformation C* • Searching for the best sequence of amino acids that minimalizes E for the given C* Algorythm: the sequence is annealed Movement in the sequence space: Metropolis MC method What about Tdes? too high: random walk too low: can be useless
Phase Diagram of Designed Proteins (Pande et al., 2000) “Folded globule”: •proteins with a stable target conformation •they are “minimally frustrated”
Digression Interpretation of a Chaperone Function avoiding aggregation e.g. HPhobic-HPhobic residues (Clark, 2004) Prion Proteins •diseases transmitted by proteins • PrPSC can induce PrPC→PrPSC transition • PrPC might be an “off-path”
Kinetics The Funnel Hypothesis How do we solve Levinthal’s Paradox? Significantly low-energy native state: partially native structures also will have lower energies than others Bumps: due to competitive interactions =>FUNNEL
Kinetics Free Energy Barriers and Nucleation Barriers of F : energetic and entropic Nucleation: • liquid-gas transition: homogeneous shrinking: ΔE and ΔS disadvantages solution: states with non-uniform density • protein folding: folding ~ seems to be a first-order transition nucleus: small, native secondary structure e.g.α-helix subsequent structure formation is speeded up
Digression Super-Arrhenius behaviour Most probably energy in the REM: Assumption: these probable conformations surrounded with ones. transition-state theory: the argument is quadratic rather than linear – “Ferry law” => roughness (σ) slows down folding
Reaction Coordinate Simple (bimolecular) chemical reactions A+BC→AB+C PES(rAB, rBC) reaction coordinate: the minimum energy path via a saddle-point Protein Folding: the choice is difficult, no general solution • similarity to the native state, Q • an alternative choice: Pfold, or “commitment” Pfold: the probability of folding before even touching an unfolded state
Digression: Alternative Reaction Coordinate “Development” on the graph • Lattice model • {C} conformation space ↔ graph • single “elementary step” difference ↔nodes C1 and C2 connected • nC – occupation number (eg. # of independent simulations) • mC – degree of the node • “Potential” on the graph nodes: • “development”: MMC dynamics • define: => Ohm’s law! Ic→c’=(nc/mc) min{1; (mc/mc’)eE(C)-E(C’)} Ic’→c=(nc’/mc’) min{1; (mc’/mc)eE(C’)-E (C)} Rcc’=max{mCeU(C);mC’eU(C’)} I= Ic→c’- Ic’→c=[Φc- Φc’]/Rcc’
Digression: Alternative Reaction Coordinate “First return” (casino) problem “particle” (money) at X0 I will end up with 0 money ↔ all the flux is going to 0 electric circuit analogy Pfold: probability to arrive to the folded state FOR THE FIRST TIME (Grosberg, 2003) x0 money 0 pfold = RCU/(RCU+RCF) punfold = RCF/(RCU+RCF)
Conclusion • protein folding: self-assembly • low-energy ground state • biased walk – correlations, funnel hypothesis • “nucleation” • sequence design
Acknowledgements I’m indepted to Michel DROZ, Alexander GROSBERG, Géza GYÖRGYI, Gabriella NETTING, Zoltán RÁCZ, Zoltán SZABÓ, László SZILÁGYI and many others…