1 / 44

Dynameomics

Background HT MD – Target Selection – Database Mining Native DB Reference unfolded peptide DB Mining Unfolding Protein DB Prion Protein and amyloid DB. Dynameomics. Valerie Daggett Bioengineering Department Biomedical and Health Informatics University of Washington Seattle, WA. DNA.

jarvis
Download Presentation

Dynameomics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Background HT MD – Target Selection – Database Mining Native DB Reference unfolded peptide DB Mining Unfolding Protein DB Prion Protein and amyloid DB Dynameomics Valerie Daggett Bioengineering Department Biomedical and Health Informatics University of Washington Seattle, WA

  2. DNA transcription RNA translation Protein Central dogma of biology Genomes …AAAGTCCAGGCAGAATATAATTCTATAAAG GGAACTCCTTCAGAGGCTGAAATCTTT… information to make protein template to make protein …LEVVAATPTSLLISWDAPAVTVRYYTYGETGGNSPVQEFTVPGS… function, phenotype Life

  3. DNA transcription RNA translation Protein Central dogma of biology Genomes …AAAGTCCAGGCAGAATATAATTCTATAAAG GGAACTCCTTCAGAGGCTGAAATCTTT… information to make protein template to make protein …LEVVAATPTSLLISWDAPAVTVRYYTYGETGGNSPVQEFTVPGS… function, phenotype Life Motion critical

  4. Dynamic cleft discovered through MD Cytochrome b5 Storch et al., Biochem, 1995, 1999a,b, 2000

  5. DNA transcription RNA translation Protein Protein folding embedded Genomes Proteinfolding problem D, denatured biologically inactive ? N, native biologically active Life

  6. DNA transcription RNA translation Protein Protein folding embedded Genomes Protein un/folding problem D, denatured biologically inactive ? Process or pathway N, native biologically active Life

  7. Unfolding pathway of CI2 in water [Simulation contains 500,000 structures] 373 K N (94 ns) TS (21 ns) D (30 ns) D (94 ns) • MD unfolding process in good agreement with experiment • TS in quantitative agreement with experiment---prediction • Residual structure in D verified experimentally • Atomic-level characterization of transition, intermediate and denatured state ensembles Daggett and Fersht, TIBS, PNAS, +

  8. Conformational ensembles in folding N TS D Day and Daggett, PNAS, 2005 100 simulations

  9. TS N D Refolding by quenching TS 8 ‘D’ 7 6 5 TS Ca RMSD (Å) 4 3 2 Control, N 1 Brute force MD can refold proteins from the TS Plan: predict TS structures, perform MD simulations and solve protein folding problem But we need info to predict TS (TS easier than D) 0 0.5 1 1.5 2 2.5 3 Time (ns) DeJong et al., JMB 2002

  10. Xtal • 5 ns • 25.6 ns • 200 ns • I57 • A16 • I57 • A16 • L49 • I20 • L49 • I20 • 4.8 Å • 4.0 Å • 8.9 Å • Reversible folding and unfolding • 348K in water, the Tm of the protein • And, refolding = unfolding Detailed pathway reversed • A16/I20 orientation maintained • Day and Daggett, JMB, 2007 • McCully et al., Biochem in press (EnHD)

  11. Reverse central dogma of biology Determine pathways for many proteins, ascertain general features D, denatured biologically inactive ? Process or pathway DNA N, native biologically active RNA Decode genomes Protein

  12. Proteins • Proteins are life’s machines, tools and structures • Many jobs, many shapes, many sizes

  13. Dynameomics Goals: • Perform HT MD simulations of representatives of all folds (41,000 structures in PDB → 1130 fold families) 2. Construct a novel relational/multidimensional database to house these data and facilitate discovery • Native state – information relevant to disease and drug design targets, SNPs • Unfolding – disease and solution to protein folding problem • NERSC • DOE • Unix • The Wall • Windows • Athena @ MS • Beck et al., Prot Eng Des Sel, 2008

  14. 700 1.0 600 0.8 500 400 0.6 Population Coverage 300 0.4 200 100 0.2 0 0 50 100 150 200 0 200 400 600 800 1000 Fold Rank Fold Rank Fold space 30 folds represent ~ 50% of known protein structures • Divide protein structures into folds • Consensus of SCOP, CATH and Dali • Rank folds based on population • Choose a representative protein from each fold Day et al., Prot. Sci., 2003

  15. Target selection • Selection criteria • Structure quality • Protein size • Experimental data available • Biomedical relevance • 1st globular then membrane CheY [PDB:3chy] Example: Rank 2, population 424 Amanda Jonsson

  16. Targets with biomedical relevance Amyloid- precursor protein HIV-1 Protease Glutathione S-transferase Alzheimer’s disease HIV Chemotherapy resistance Triosephosphate isomerase MAP30 Serum amyloid P component Amyloidosis HIV and cancer Neurodegeneration

  17. Top 30 folds Represent 50% of all known protein folds Data and metadata for ‘Top 30’ at www.dynameomics.org

  18. Dynameomics protocol • One 298 K native state simulation (21-60 ns, <26 ns>) • At least three 310 K native simulations (some) • At least five 498 K unfolding simulations • Two long simulations (at least 31 ns, <36 ns>) • At least three short simulations (2 ns, <14 ns>) • (5 simulations ~ 100 simulations) Trade-off sampling of different folds and different sequences as opposed to more thorough sampling of individual protein (~400 simulations of PrP)

  19. Validation of Trajectories • Computational checks: energy conservation • Native State: NOEs, S2 order parameters from NMR relaxation experiments, etc. • Unfolding Process: F values, residual structure in denatured state, intermediates David Beck

  20. Native State Simulations: Ubiquitin • NOEs (2727) • MD: 95.2 % • XTAL: 94.4% • Proton Chemical shifts: R=0.98

  21. Comparison with available NMR • The 27 proteins with available data (by PDB code) are: 1aa3, 1c06, 1d1r, 1gle, 1kjs, 2ife, 3gcc, 1bf0, 1cmz, 1cok, 1cz4, 1d1n, 1d8v, 1enh, 1fad, 1fvl, 1fzt, 1ght, 1i11, 1iyu, 11dl, 1mut, 1sso, 1tfb, 1ubq, 1uxc, 3chy. • Proton chemical shifts from MD structures were calculated with SHIFTS (Osapay and Case, 1991). The 15 proteins with data available (by PDB code): 1mjc, 1hcc, 1ubq, 1baz, 1cz4, 1a2p, 1e65, 1ill, 3chy, 1ght, 1cmz, 1gpr, 1byl, 1fzt, 1b10.

  22. Dynameomics status • Dataset includes over 500 proteins and nearly 4000 simulations for a total of >60 s of simulation time, > 65M structures • > 64 TB Not including 637 amyloid simulations

  23. Comprehensive data/metadata In theory, build a warehouse Andrew Simms

  24. Build a data warehouse (not so easy) • The data set is large… (~6 months to load protein coordinates) • Storing protein data only, no solvent data • Only single simulations per table (10M – 90M rows) • 4000 simulations x 10 analyses right now (40K tables) • And we are growing at a rate of ~2000 simulations per year (10K tables) • Approach for scaling... • Multiple servers • Multiple databases per server • 100 targets per database • Andrew Simms • Simms et al., Prot Eng Des Sel, 2008

  25. Though our data set may be large, our requirements are typical in the scientific world Large, complex and often multidimensional data sets Analytical rather than transactional processing Need for performance and storage efficiency Multi-D cubes for complex data analysis On-line analytical processing – OLAP MOLAP – multidimensional OLAP Catherine Kehl

  26. Molecular Dynamics • MD provides atomic resolution of native dynamics 3chy, waters and hydrogens hidden

  27. Molecular Dynamics • MD provides atomic resolution of native dynamics native state simulation of 3chy at 298 K, Asp 57

  28. Native-state dynamics: helix motion a3:a4 a2:a3 a3:a4 Standard Deviation Helix Angle (degrees) CheY at 298 K α5 α4 α2 α2 α4 α3 α3 0 ns 5 ns 10 ns 15 ns 20 ns a2 and a3 dynamic, a4 and a5 stable structural scaffold

  29. CheY – Binding partners Structures of CheY complexes -show binding to α4 and α5 a4:a5 Distances between ends of helices α5 α4 α2 α3 20 ns α2 α4 α5 α3 CheY - CheZ • Functionally important face of protein stable • Asp 57, phosphorylation • Motion in a2 and a3 does not disrupt function, entropy sink? CheY - CheA CheY-FliM Rudesh Toofanny

  30. Catechol O-methyltransferase CheY COMT • Both proteins: Rank 2 Rossman fold • COMT polymorphism: Val108 → Met • 108M - increased risk for diseases such as breast cancer and OCD • Improved memory MD 108M • a6 and a7 mobile in COMT, too • In 108M movement of a6 propagated 16 Å and disrupts the active site 15 ns Rutherford et al., Biochem. 2006 30 ns Importance of characterizing dynamics

  31. Native-like • Intermediate Rutherford et al., BBA, JMB, JMB, Biochem, 2008 SNP-induced changes in COMT a8 a7 a6 108V 108M Mutation to Met leads to loosening of the active site Followed up with CD, NMR, crystallography, fluorescence

  32. SNP leads to broader conformational ensemble at 310 K Starting Structure 25 °C 37 °C 50 °C 108V COMT 108M COMT Ca-RMSD Distribution (Å)

  33. SNP-omics COMT – SNP leads to subtle differences in packing near the mutation site that propagate to the active site Similar behavior now seen in 4 other members of this methyltransferase family (fold rank 2) Effects NOT apparent in static structures Large scale effort to investigate dynamic effects of SNPs starting with 80 proteins ---- dynameomics protocol add multiple 310 K simulations

  34. SLIRP • Structural Library of Intrinsic Residue Propensities (SLIRP) to determine structural propensities for design • GGXGG peptides at in water at 298 K and 498 K and in 8M urea at 298 K (multiple simulations, 100 ns) • Unbiased coil library, main chain and side chain, exhaustive sampling • Dynamic protein side chain rotamer library • Rotamer populations, improved over static from crystal structures • S2axis, waiting times between rotamers

  35. “Random Coil” Peptides: Ala GGAGG Protein-MD Protein-PDB 16% 26% 4% 26% 24% F (°) F (°) F (°) HN, Ha, Hb, NH, Ca, Cb, and C’ for GGAGG are very close to the corresponding experimentally derived values (R = 0.999 over 28 points, 7 atoms x 4 independent simulations).

  36. Chemical shifts for GGXGG: MD and Expt Predictions calculated with ShiftX v1.0 (Neal et al., 2003, J Biomol NMR) Experimental data taken from Schwarzinger et al., J. Biomol NMR, 2000

  37. “Random Coil” Peptides vs. Protein: Ala GGAGG Protein-MD Protein-PDB F (°) F (°) F (°) Ala in protein MD distributions (188 proteins) similar to PDB Ala in GGXGG different GGAGG vs experimental helix propensities, R = 0.28 Protein MD vs helix propensities, R = 0.92 Host-guest studies reflecting the host more than the guest

  38. Mining the database • SLIRP to determine structural propensities for design • Dynamic area conserved in members of protein family. In one case critical for biological function and in another mutation at the region leads to disease • Inflexible region across 188 proteins, identified novel structural elements associated with loop structure (antifreeze) Rudesh Toofanny Noah Benson

  39. Unfolding N TS D Refolding ? ? TS N D Solving the protein folding problem? • Data mining of the Dynameomics database for information to predict TS structures • Bootstrapping to native state prediction by refolding from predicted transition state structures Dustin Schaeffer

  40. Contact analysis • Determined contact probabilities by amino acid and separation between the amino acids from mining of Dynameomics DB Contacts i → i+x Leu Leu Residue Type 2 Residue separation Leu-Leu i → i+3 i→i+2 i i→i+3 Residue Type 1 i→i+1

  41. Coordinates from contacts Most Probable contacts Protein structure DG A set of distances for a particular sequence can be converted into coordinates by singular value decomposition (SVD) of a distance matrix ― distance geometry

  42. TS predictions for Fyn SH3 Prediction from mined data via distance geometry (too compact) RMSD = 3.8 0.37Å MD-generated TS ensemble

  43. DB Info + DG We have TSs for 80% of known protein structures We have refolded from TS MD Solving the folding problem with MD High-throughput structure prediction should be possible by refolding from transition states Sequence TS Structure N Structure

  44. Dynameomics Conclusions • Native state simulations to probe protein function, for drug design, SNP-omics • Unfolding simulations for structure prediction, protein design/redesign, unfolding diseases • SLIRP---Structural Library of Intrinsic Residue Propensities: intrinsic mainchain conformations, dynamic side chain rotamer library, coil library • Dynameomics.org • Noah Benson

More Related