270 likes | 391 Views
All-atom molecular simulations of protein folding and unfolded-state dynamics and structure with accelerated calculations on GPU. Cezary Czaplewski Faculty of Chemistry University of Gdańsk Poland. The 10th Protein Folding Winter School, KIAS, February, 7-11, 2011.
E N D
All-atom molecular simulations of protein folding and unfolded-state dynamics and structure with accelerated calculations on GPU CezaryCzaplewski Faculty of Chemistry University of Gdańsk Poland The 10th Protein Folding Winter School, KIAS, February, 7-11, 2011
Molecular Simulation of ab Initio Protein Folding for a Millisecond Folder NTL9(1-39) Vincent A. Voelz,1 Gregory R. Bowman,2 Kyle Beauchamp,2 Vijay S. Pande1,2,3 1 Department of Chemistry, Stanford University, 2 Biophysics Program, Stanford University 3 Department of Structural Biology Stanford University J. AM. CHEM. SOC. 2010, 132, 1526–1528
Computer simulations, validated by experiment, can help gain a complete understanding of how proteins fold. • Over a million-fold range in folding rates = possible diversity in folding mechanism. • Folding@Home using GPU allowing for several folding trajectories of 39-residue NTL9(1-39), the slowest-folding protein (~1.5 ms folding time) folded ab initio with all-atommodel MD to date. • Insights into folding mechanism based on Markov state model (MSM).
all atom MD step sidechain rotation helix formation protein folding 10-15 femto 10-12 pico 10-9 nano 10-6 micro 10-3 milli 100 seconds bond vibration folding of -hairpins loop closure
GPU • Type of CPU attached to a graphics carddedicated to calculating floating point operations • Incorporates stream processing microchips which containspecial mathematical operations • Stream Processing: applications can use multiplecomputational units without explicitly managingallocation, synchronization, or communicationamong those units.
CPU vs. GPU CPU – 4 cores
Proteins folded ab initio by all atom MD Trp-cage 4.1 ms Pitera, Swope, PNAS 2003 Fip35 WW 13 ms Ensign, Pande, Biophys. J., 2009 Villin headpiece 10 ms Zagrovic, Snow, Shirts, Pande, JMB 2002 Fast folding villin variant <1 ms Ensign, Kasson, Pande, JMB 2007
Folding@Home using Gromacs with OpenMM library written specially for GPU allowing dramatically longer trajectories • AMBER ff96 with Onufriev, Bashford,Case GBSA • Up to 10000 parallel MD simulations at 300, 330, 370 and 450K • Starting from native, random coil, extended • Aggregate 1.52 ms • Out of the ~3000 trajectories started from unfolded states at 370K only two reach <3.5 Å RMSD and eight <4 Å RMSD • Number of folding events is consistent with a simple model of parallel uncoupled folding as a two-state Poisson process: 〈n〉 = ∫M(t)k exp(-M(t) kt) dt M(t) is the number of parallel simulations that reach time t. k is ~640/s experimental folding rate
Distributions of rmsd for native-state simulations of NTL9(1−39) after 10 μs Posterior predictions of the folding rate The number of parallel simulations at 370 K that reach time t.
A snapshot from a folding trajectory 3.1 Å RMSD Non-native and native-like hydrophobic core arrangements
Markov state model (MSM) • MSM constitutes a kinetic clustering • Conformations that can interconvert rapidly are grouped into the same state • Conformations that can only interconvert slowly are grouped into separate states • Satisfies the Markov property—the identity of the next state depends only on the identity of the current state and not any of the previous states • Transition probability matrix T propagates state probabilities p • An implied timescale k for given lag time tcan be calculated from the eigenvaluesm of matrix T
Detail of MSMBuilder package • 100,000 microstates were generated by clustering conformations separated by 10 ns using k-centers algorithm • The remaining 90% of the data was then assigned to these clusters • The resulting microstates had an average radius of ~4.5 Å • A macrostate model generated by lumping microstates into 2,000 macrostates using the Robust Perron Cluster Analysis (PCCA+) algorithm • Although only a few folding trajectories were observed directly, a network of many possible pathways can be inferred from the overlapping sampling of local transitions. • Top 10 folding fluxes, calculated by a greedy backtracking algorithm
Implied timescales Markov State Models (MSMs) built at lag times between 1 and 32 ns 100,000-microstate model 2000-macrostate model
A scatter plot of the 2000 macrostatesShown in red are the 14 macrostates transited by the top ten pathway fluxes
A 2000-state Markov State Model (MSM). The top 10 folding pathways account for ∼25% of the total flux and transit 14 of the 2000 macrostates
Contact profile subspaces used to calculate Qa Qb12 Qb13 c(x)– contact profile indexed by x = (i, j)
The 14 macrostates plotted along structural and kinetic reaction coordinates
Contact profiles for the 14 macrostates involvedin the top folding pathways
Values of Q for each of the 14 macrostates involved in the top ten folding pathways
Macrostatesl, m and n have very similar structural ensembles and similar pfold values These states differ mostly in their hairpin registrations and packing of the hairpin loop.
Conclusions • Existing force field models using implicit solvent are accurate enough to fold proteins ab initio at long time scales, openingthe door to simulating more structurally complex proteins. • There need not be a single pathway or single, dominant mechanism for the folding of a given protein. • Multiple mechanisms could be simultaneously present . • The sequence of the protein, coupled with the chemical environment, control the balance to which each mechanistic pathway is seen.
Take-home message • GPU can speed up your simulations 10 times • Existing force field models using implicit solvent are accurate enough to fold proteins during MD. • With only a few folding trajectories observed directly, a network of many possible pathways can be inferred from kinetic clustering using the Markov State Model. • Several pathways for the folding of a given protein. • Multiple folding mechanisms (a diffusion-collision or nucleation-condensation) could be simultaneously present .