290 likes | 306 Views
Computer simulations of proteins: all-atom and coarse-grained models. Valeri Barsegov Department of Chemistry University of Massachusetts Lowell. YITP, Kyoto University, Japan (2008). Outline:. Introduction:
E N D
Computer simulations of proteins: all-atom and coarse-grained models Valeri Barsegov Department of Chemistry University of Massachusetts Lowell YITP, Kyoto University, Japan (2008)
Outline: • Introduction: • single molecule spectroscopy of protein unfolding: biological relevance; pulling experiments (AFM, laser/optical tweezers, force protocols) • single molecule spectroscopy of unbinding: biological relevance; experimental probes; resolution of forces, lifetimes, and extension • II. Molecular simulations of proteins: • proteins: structure, fold types, examples • all-atom Molecular Dynamics (MD) simulations: force fields, examples, simulations of IR spectra • coarse-grained description of proteins: approximations, examples • III. New direction - computer simulations using graphics cards: • basic facts, computer architecture, algorithms • applications
I.1 Single-molecule dynamic force spectroscopy of forced unfolding of proteins: biological relevance Fact 1:“mechanically active” proteins perform their biological function in linear tandems of “head-to-tail” (C-terminal-to-N-terminal) connected protein domains • Examples: • Titin contains tandems of immunoglobulin (Ig) domains, separated by short linkers sequences (muscle function) • Actin-crosslinking filamins contain rod-like tandem of ddFLN domains (cellular locomotion) • Fibronectin tandems consist of nonidentical Fn domains (extracellular matrix, cell elasticity) • Ubiquitin is a multimeric protein (Ub)n of n=9 identical Ub repeats (protein degradation, signaling pathways)
I.2 Single-molecule dynamic force spectroscopy of forced unfolding of proteins: AFM experiment force-clamp mode force-ramp mode M. Rief, M. Gautel, F. Oesterhelt, J. Fernandez & H. Gaub, Science, 276, 1109 (1997); R. Zinober, D. Brockwell, G. Beddard, A. Blake, P. Olmsted, S. Radford & D. Smith, Protein Sci., 11, 2759 (2002) J. Brujic, R. Hermans, K. Walther & J. Fernandez, Nature Phys., 2, 282 (2006); J. Fernandez & H. Li, Science, 303, 1674 (2004)
I.3 Single-molecule dynamic force spectroscopy of forced unbinding of proteins: biological relevance
I.4 Single-molecule dynamic force spectroscopy of forced unbinding of proteins: leukocyte rolling on endothelium J.-G. Geng, M. Chen, K.-C. Chou, Curr Med Chem, 11, 2153 (2004); L. M. Coussens, Z. Werb, Nature, 420, 860 (2002); Y. J. Kim, L. Borgis, N. M. Varki, A. Varki, Proc. Natl. Acad. Sci. USA, 95, 9325 (1998); J. Weisel, H. Shuman, R. Litvinov, Curr Opin Struct Biol, 13, 227 (2003)
I.5 Single-molecule dynamic force spectroscopy of forced unfolding of proteins: pulling force AFM experiment f-constant f(t)=rf t t, s f, pN J. Weisel, H. Shuman, R. Litvinov, Curr Opin Struct Biol, 13, 227 (2003); M. Schlierf, H. Li, J. Fernandez, PNAS, 101, 7299 (2004); J. Liphardt, D. Smith, C. Bustamante, Curr Opin Struct Biol, 19, 279 (2000); J.-F. Allemand, D. Bensimon, V. Croquette, ibid, 13, 266 (2003); S. Weiss, Science, 283, 1676 (1999); E. Evans, PNAS, 98, 3784 (2001)
I.6 Single-molecule dynamic force spectroscopy of proteins: experimental resolution of unfolding forces, times, and distances • Experimental resolution: • protein extensionX ~1 nm; • stretching forcefS 100pN • force-quenchfQ5-10pN • relaxation intervalT 10-100s J. Fernandez & H. Li, Science, 303, 1674 (2004); I. Schwaiger, M. Schleicher, A. Noegel & M. Rief, EMBO Reports, 6, 46 (2005); J. Brujic, R. Hermans, K. Walther & J. Fernandez, Nature Phys., 2, 282 (2006)
II.1 Molecular simulations of proteins: levels of structure of proteins • Amino acids in proteins (or polypeptides) are joined together by peptide bonds. • The sequence of R-groups along the chain is called the primary structure. • Secondary structure refers to the local folding of the polypeptide chain. • Tertiary structure is the arrangement of secondary structure elements in 3D • Quaternary structure describes the arrangement of a protein's subunits. The PDB is the single worldwide repository of 3D structure data of proteins and nucleic acids: ~35,000 structures as of August 2005. (www.rcsb.org/pdb) Other Web Resources: 1. NCBI 2. The European Bioinformatics Institute (EBI) (www.ebi.ac.uk) 3. The RNA world (www.imb-jena.de/RNA.html)
II.2 Molecular simulations of proteins: secondary and tertiary structure of proteins Φ = -57o , Ψ = -47o right handed alpha-helix Chain has directionality!
II.3 Molecular simulations of proteins: secondary and tertiary structure of proteins Φ = (-110o, -140o), Ψ = (110o, -135o)=> beta-sheet
II.4 Molecular simulations of proteins: quaternary structure of proteins Alpha-beta folds Multi-domain proteins a) Control protein b) Immunoglobulin(muscles) c) Fibronectin d) Growth factor Knotted proteins
II.5 All-atom classical Molecular Dynamics (MD) simulations: force fields I. Potential for bonded interactions: VBL-bondlength potential,VBA-bond-angle potential,VDIH-dihedral angle potential,VSS – disulfide bond potential II. Potential for non-bonded interactions: VPP- protein-protein interaction potential,VWW- wa-ter-water potential,VWP- water-protein interaction potential III. Software (open-source): IV. Water models: • GROMACS (force field: OPLS and GROMOS ) • NAMD (force fields: CHARMM22, CHARMM27) • GROMACS (SPC, SPC/E, SPC-fw) • NAMD (TIP, TIP3P) GROMACS (Univ. of Groeningen, Netherlands): ftp://ftp.gromacs.org/pub/ NAMD (Univ. of Illinois at Urbana Shampaign, USA): http://www.ks.uiuc.edu/Research/namd/
II.6 All-atom MD simulations of proteins: examples of fibrinogen and A-knob-a-hole complex of fibrin • Fibrin polymerisation: ~2,400 a.a., ~48nm • essential for blood clotting • implicated in heart attack and stroke
II.7All-atom MD simulations of proteins: IR spectroscopy of proteins - infrared light (vibrations of bonds) Amide I & Amide II are the major bands: - conformationally sensitive - localized at individual a.a site Amide I : C=O-stretching (90%)+C-N-stretch (10%) Amide II: N-H-bending (60%)+C-N-stretch (40%) Amide I Krimm & Bandekar, Adv. Prot. Chem., 38, 181 (1986); Woutersen & Hamm, J. Phys: Cond. Matt. 14, R1035 (2002); Venyaminov & Kalnin, Biopolymers, 30, 1243 (1990); Chergadze $ Nevskaya, ibid, 15, 637 (1976)
II.8All-atom MD simulations of proteins: IR spectroscopy of proteins 1. Vibrational exciton Hamiltonian: 2. Transition dipole coupling (TDC): 3. Linear absorption spectrum: Cheatum et al, JCP, 120, 8201 (2004); Torii & Tasumi, JCP, 96, 3379 (1992); S. Mukamel, Principles of Nonlinear Spectroscopy
II.9All-atom MD simulations of proteins: IR spectroscopy of proteins Assumptions used in the vibrational exciton Hamiltonian: - dynamics of in the near-equilibrium state - fast bath relaxation (fixed line broadening, ) - fitting parameters (diagonal energies, peak amplitudes, frequency splitting) - energies/amplitudes are from ab initio maps of N-methylacetamide, glycine dipeptide analogs - transferability of ab initio maps to larger proteins Direct calculation of IR spectra of Amide I from MD: - Amide I CO-vibration with - Correction Factor due to assumptions/harmonic force field Advantages of correlation functions: - IR obtained directly from classical MD - beyond ensemble average - far-from-equilibrium regime
II.10All-atom MD simulations of proteins: IR spectroscopy of proteins Ubiquitin (1UBQ, 76 a.a): - water box (4,600 TIP3P, 47Å51Å 57Å) - 8 trajectories (t=4ps, dt=0.1fs, NVE) at T=300K - Ewald sum method (long range electrostatics) - 12Å cutoff for L-J forces A16-22 (3KLVFFAE, 21 a.a): - water box (2000 TIP3P, 44Å41Å 36Å) - - Ewald sum method; 12Å cutoff for L-J forces - 12 trajectories (t=8ps, dt=0.1fs, NVE) at T=300K Correction Factor=0.985 (CHARMM22) Chung et al, PNAS, 102, 612 (2005) Cheatum et al, JCP, 120, 8201 (2004)
II.11Coarse-grained (CG) descriptions of proteins: building the CG model I. Coarse-grained model for P-selectin: • Step 1: creating structure file of Ca & centers of mass of residues from PDB structure of P-selectin (www.rcsb.org) • mimicking hydrogen bonds • modeling S-S bonds Step 2: computing potential energy of ob-tained conformation of P-selectin: Step 3: follow Langevin Dynamics K. Dill et al, Protein Sci, 4, 561 (1995); D. Thirumalai, D. Klimov, PNAS, 97, 2544 (2000); J. Bryngelson et al,Protein, 21, 167 (1995); M. Karplus, A. Sali, Curr Opin Struct Biol, 5, 58 (1995); Kolinski, J. Skolnick, Polymer, 45, 511 (2004)
II.11Coarse-grained (CG) descriptions of proteins: force field I. Scales of energy/length/mass/time: - hydrophobic interaction (1.25 kcal/mol); -bond length (3.8 Å) - residue mass ( ); - the timescale (~3ps ) II. Harmonic connectivity potentials: III. Dihedral angle potential: turn β-sheet α-helix
II.11Coarse-grained (CG) descriptions of proteins: force field IV. Hydrogen bond potential: V. Potential for native contacts: bij-contact interaction matrix; -contact distance (Kolinski et al, JCP, 98, 7420 (1993)) VI. Nonbonded potential: VII. Unfolding/unbinding trajectories:
II.12Coarse-grained (CG) descriptions of proteins: forced rupture of the P-selectin-sPSGL noncovalent bond N-terminus of P-selectin C-terminus of sPSGL-1
III.1Computer simulations using graphics cards: basic facts • CPU: • Advantages: • can perform very sophisticated flow control (IF/THEN – cycles, conditionals, etc.) • single CPU cores are faster (3.0GHz) or faster • a lot of well-tested (commercial) software is available • Disadvantages: • has no more than 6 cores (today) • parallel programming on CPU is difficult • data exchange b/w nodes in a cluster occurs through relatively slow network • GPU: • Advantages: • up to 240 cores (GeForce 280, Tesla C1060) • easy to write parallel codes with CUDA language (extension of C) • memory bandwidth is high because all cores are local • Disadvantages: • single core clock is not as fast as CPU core (0.5GHz) • can’t be used for applications with sophisticated flow control • not many software available for GPU (started in ~2006)
III.2Computer simulations using graphics cards: hardware • GPU: • highly parallel • multythreaded • manycore processor • Historically,GPU • was designed for compute-intensive, highly parallel computation • more transistors are devoted to data processing rather than data caching and flow control • well-suited for problems that involve data-parallel computations, i.e. the same program is executed on many data elements in parallel (MD, coarse-grained simulations).
III.3Computer simulations using graphics cards: programming mode • CUDA: • consist of a minimal extension to the C language • parallel programming model and software environment • designed to overcome the challenge of creating software that transparently scales on manycore processors • Example: • vecAdd() function is called N times on GPU. • <<<1, N>>> means that the procedure runs in one 1D block with N threads. • i = threadIdx.x is a way for thread to identify, which element of the vector it should work with.
III.4Computer simulations using graphics cards: software organization Thread hierarchy: • thread index is a 3D vector, so that threads can be identified using a 1D, 2D or 3D index forming 1D, 2D or 3D thread block • 2. multiple blocks can be organized into 1D or 2D grid. Each block can be identified within grid using 1D, 2D or 3D block index • 3. all threads in one block are doing the same thing with different data • 4. threads can synchronize and pass data to each other within block using shared memory • 5. threads can pass the data to the CPU through the GPU global memory
III.5Computer simulations using graphics cards: software organization Memory hierarchy: • each thread has it own local memory (in cache) for storing temporary variables • each block has shared memory (in cache) for synchronizing the threads within block • device has global memory that can be accessed from any thread on the GPU • local memory and shared memory are much faster than global, but they are available only locally and exists only during the lifetime of a thread or a block. • 5. global memory is relatively slow, and can be also accessed from CPU.
III.6Computer simulations using graphics cards: hardware model Hardware organization: • each device have Nmultiprocessors • multiprocessors can share data only through device (global) memory • A multiprocessor have M processors (ALUs) • 4. number of threads that can run at the same time is equal to NxM; for GeForce 8800GT, M=8, N=14 (number of processors = 112!!!) • 5. one block can run only on one multiprocessor so the number of blocks in program should be at least equal to the number of multiproces-sors on the device.
III.7Computer simulations using graphics cards: applications • MD and CG simulations are suitable for GPU: • same potential (force field) for all atoms • (beads) • integration scheme is explicit • systems have huge number of atoms (beads) • Example: • Rouse chain model of homopolymer • Lennard-Jones potential (self-avoidance) • 1,000,000 time steps for each chain • Intel Xeon 2GHz Dual Core (CPU, ~$350) vs GeForce 8800 GT (GPU, ~$130)