1 / 44

Conformation Networks: an Application to Protein Folding

Center for Nonlinear Studies. Conformation Networks: an Application to Protein Folding. Zoltán Toroczkai. Erzsébet Ravasz. Center for Nonlinear Studies. Gnana Gnanakaran (T-10). Theoretical Biology and Biophysics. Los Alamos National Laboratory. Proteins.

earl
Download Presentation

Conformation Networks: an Application to Protein Folding

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Center for Nonlinear Studies Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Erzsébet Ravasz Center for Nonlinear Studies Gnana Gnanakaran (T-10) Theoretical Biology and Biophysics Los Alamos National Laboratory

  2. Proteins • the most complex molecules in nature • globular or fibrous • basic functional units of a cell • chains of amino acids (50 – 103) • peptide bonds link the backbone Native state • unique 3D structure (native physiological conditions) • biological function • fold in nanoseconds to minutes • about 1000 known 3D structures: X-ray crystallography, NMR

  3. Myoglobin 153 Residues, Mol. Weight=17181 [D], 1260 Atoms Main function: primary oxygen storage and carrier in muscle tissue It contains a heme (iron-containing porphyrin ) group in the center. C34H32N4O4FeHO

  4. Protein conformations Amino-acid • defined by dihedral angles • 2 angles with 2-3 local minima of the torsion energy • N monomers  about 10N different conformations

  5. Levinthal’s paradox • Anfinsen: thermodynamic hypothesis • native state is at the global minimum of the free energy Epstain, Goldberger, & Anfinsen, Cold Harbor Symp. Quant. Biol.28, 439 (1963) • Levinthal’s paradox, 1968 • finding the native state by random sampling is not possible • 40 monomer polypeptide  1013 conf/s  3 1019 years to sample all  universe ~ 2 1010 years old Levinthal, J. Chim. Phys. 65, 44-45 (1968) Wetlaufer, P.N.A.S. 70, 691 (1973) • nucleation • folding pathways

  6. Free energy landscapes • Bryngelson & Wolynes, 1987 • free energy landscape Bryngelson & Wolynes, P.N.A.S. 84, 7524 (1987) • a random hetero-polymer typically does NOT fold • Experiment: • random sequences • GLU, ARG, LEU • 80-100 amino-acids • ~ 95% did not fold • in a stable manner Davidson & Sauer, P.N.A.S.91, 2146 (1994)

  7. Funnels Given any amino-acid sequence: can we tell if it is a good folder? • experiments (X-ray, NMR) • molecular dynamics simulations • homology modeling • Leopold, Mortal & Onuchic, 1992 Leopold, Mortal & Onuchic, P.N.A.S. 89, 8721 (1992) Energy funnels Difficult and slow • many folding pathways

  8. Molecular dynamics Sanbonmatsu, Joseph & Tung, P.N.A.S. 102 15854 (2005) • State of the art • supercomputer (LANL) • Ribosome in explicit solvent: • targeted MD • 2.64x106 atoms (2.5x105 + water) • Q machine, 768 processors • 260 days of simulation (event: 2 ns) ~ 1016 times slower • distributed computing (Stanford, Folding@home) • more than 100,000 CPU’s • simulation of complete folding event • BBA5, 23-residue, implicit water • 10,000 CPU days/folding event (~1s) Shirts & Pande, Science290, 1903 (2000) Snow, Nguyen, Pande, Gruebele, Nature420,102 (2002)

  9. Configuration networks Ramachandran map PDB structures • Configuration networks Protein conformations • dihedral angles have few preferred values Ramachandran & Sasisekharan, J.Mol.Biol.7, 95 (1963) • Helix • Sheet • other NODE configuration LINK change of one degree of freedom (angle) • refinement of angle values  continuous case

  10. Why networks? • VERY LARGE: 100 monomers  10100 nodes. However: Generic features of folding are determined by STATISTICAL properties of the configuration network • degree distribution • average distance • clustering • degree correlations • toolkit from network research • captures the high dimensionality Albert & Barabási, Rev. Mod. Phys.74, 67 (2002); Newman, SIAM Rev.45, 167 (2003) • faster algorithms to simulate folding events • pre-screening synthetic proteins • insights into misfolding

  11. A real example • The Protein Folding Network: F. Rao, A. Caflisch, J.Mol.Biol, 342, 299 (2004) • beta3s: 20 monomers, antiparallel beta sheets • MD simulation, implicit water • 330K, equilibrium folded  random coil NODE -- 8 letters / AA (local secondary struct) LINK -- 2ps transition

  12. Its native conformation has been studied by NMR experiments: De Alba et.al. Prot.Sci. 8, 854 (1999). Beta3s in aqueous solution forms a monomeric triple-stranded antiparallel beta sheet in equilibrium with the denaturated state. • Simulations @ 330K • The average folding time from denaturated state ~ 83ns • The average unfolding time ~83ns • Simulation time ~12.6s • Coordinates saved at every 20ps (5105 snapshots in 10s) • Secondary structures: H,G,I,E,B,T,S,- (-helix, 310 helix, -helix, extended, isolated -bridge, hydrogen-bonded turn, bend and unstructured). • The native state: -EEEESSEEEEEESSEEEE- • There are approx. 818 1016 conformations. • Nodes: conformations, transitions: links.

  13. Scale-free network Many real-world networks are scale free • hubs beta3s randomized • co-authorship (=1 - 2.5) • citations (=3) • sexual contacts (=3.4) • movie actors (=2.3) • Internet (y=2.4) • World Wide Web (=2.1/2.5) • Genetic regulation (=1.3) • Protein-protein interactions ( =2.4) • Metabolic pathways (=2.2) • Food webs (=1.1) Barabási & Albert, Science 286, 509, (1999); Many reasons behind SF topology • Why is the protein network scale free? • Why does the randomized chain have similar degree distribution? • Why is  = - 2 ?

  14. Robot arm networks 12 n=0 22 02 11 01 21 0 2 1 20 n=1 00 10 00 22 n=2 01 21 02 20 021 021 10 12 11 020 020 010 010 000 000 100 100 200 200 • Steric constraints? • missing nodes • missing links • n-dimensional hypercube • binomial degree distribution Homogeneous Swiss cheese

  15. A bead-chain model • Beads on a chain in 3D: robot arm model • similar to C protein models • rod-rod angle  • 3 positions around axis Honeycutt & Thirumalai, Biopolymers32, 695 (1992) N=6;  = 90 N=18;  = 120 2212112212111122 • Homogeneous network

  16. Another example: L = 7,  = 75 , r = 0.25 “00100” state “00100” allowed state forbidden state

  17. Adding monomers not only increases the number of nodes in the network but also its dimensionality!! The combined effect is small-world.

  18. Shortcuts in Folding Space

  19. The “dilemma” SCALE FREE • from polypeptide MD simulations • beta3s • randomized version HOMOGENEOUS • from studies of conformation networks • bead chain • robot arm ?

  20. Gradient Networks Gradients of a scalar (temperature, concentration, potential, etc.) induce flows (heat, particles, currents, etc.). Naturally, gradients will induce flows on networks as well. Ex.: Load balancing in parallel computation and packet routing on the internet Y. Rabani, A. Sinclair and R. Wanka, Proc. 39th Symp. On Foundations of Computer Science (FOCS), 1998:“Local Divergence of Markov Chains and the Analysis of Iterative Load-balancing Schemes” References: Z. T. and K.E. Bassler, “Jamming is Limited in Scale-free Networks”, Nature, 428, 716 (2004) Z. T., B. Kozma, K.E. Bassler, N.W. Hengartner and G. Korniss “Gradient Networks”, http://www.arxiv.org/cond-mat/0408262

  21. Setup: Let G=G(V,E) be an undirected graph, which we call the substrate network. The vertex set: The edge set: A simple representation of E is via the Nx N adjacency (or incidence) matrix A (1) Let us consider a scalar field Set of nearest neighbor nodes on G of i :

  22. Definition 1 The gradient h(i) of the field {h} in node i is a directed edge: (2) Which points from i to that nearest neighbor for G for which the increase in the scalar is the largest, i.e.,: (3) The weight associated with edge (i,) is given by: . . The self-loop is a loop through i with zero weight. Definition 2 The set F of directed gradient edges on G together with the vertex set V forms the gradient network: If (3) admits more than one solution, than the gradient in i is degenerate.

  23. In the following we will only consider scalar fields with non-degenerate gradients. This means: Theorem 1 Non-degenerate gradient networks form forests. Proof:

  24. 0.48 0.82 0.67 0.46 0.65 0.6 0.53 0.44 0.5 0.67 0.7 0.2 0.22 0.65 0.1 0.19 0.16 0.87 0.14 0.15 0.32 0.2 0.2 0.18 0.05 0.55 0.44 0.15 0.05 0.24 0.43 0.16 0.8 0.13 0.65 0.65 Theorem 2 The number of trees in this forest = number of local maxima of {h} on G.

  25. For Erdős - Rényi random graph substrates with i.i.d random numbers as scalars, the in-degree distribution is:

  26. The Configuration model A. Clauset, C. Moore, Z.T., E. Lopez, to be published.

  27. Generating functions: K-th Power of a Ring

  28. Power law with exponent =- 3 2K+l

  29. The energy landscape • Energy associated with each node (configuration) • the gradient network • most favorable transitions • T=0 backbone of the flow • MD simulation • tracks the flow network • biased walk close to the gradient network • trees • basins of local minima What generates  = - 2 ? The REM generates an exponent of -1.

  30. Model ingredients k, E increases constrained (folded) small kconf lower energy loose (random coil) large kconf higher energy • A network model of configuration spaces • network topology • homogeneous • degree correlations • how to associate energies

  31. Random geometric graph R=0.113, <k>=20 • Energy proportional to connectivity E k • random geometric graph Dall & Christensen, Phys.Rev.E 66, 026121 (2002) • in higher D: similar to hypercube with holes • degree correlations

  32. N=30000, <k> = 1000, d=2.

  33. Exponent is - 2 2 essential ingredients: k1-k2 correlations <E> with k monotonic

  34. Bead-chain model Lennard-Jones potential Repulsive Attractive • more realistic model: bead-chain • configuration network • excluded volume • energy: Lennard-Jones

  35. L = 30,  = 75

  36. The case of the -helix AKA peptide • ALA: orange • LYS: blue • TYR: green MD simulations, no water.

  37. The MD traced network T = 400 More than one simulation 3 different runs: yellow, red and green The role of temperature

  38. Conclusions • A network approach was introduced to study sterically constrained conformations of ball-chain like objects. • This networks approach is based on the “statistical dogma” stating that generic features must be the result of statistical properties of the networks and should not depend on details. • Protein conformation dynamics happens in high dimensional spaces that are not adequately described by simplistic reaction coordinates. • The dynamics performs a locally biased sampling of the full conformational network. For low enough temperatures the sampled network is a gradient graph which is typically a scale-free structure. • The -2 degree exponent appears at and bellow the temperature where the basins of the local energy minima become kinetically disconnected. • Understanding the protein folding network has the potential of leading to faster simulation algorithms towards closing the gap between nature’s speed and ours. Coming up: conditions on side chain distributions for the existence of funneled energy landscapes.

More Related