1 / 30

Folding@Home and Genome@home: Protein folding and design with distributed computing

Folding@Home and Genome@home: Protein folding and design with distributed computing. Stefan Larson Pande Group Dept. of Chemistry and Biophysics Program Stanford University. Pande Group Dr. Vijay Pande Folding@home Siraj Khaliq Young Min Rhee Michael Shirts Chris Snow Eric Sorin

damian
Download Presentation

Folding@Home and Genome@home: Protein folding and design with distributed computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Folding@Home and Genome@home: Protein folding and design with distributed computing Stefan Larson Pande Group Dept. of Chemistry and Biophysics Program Stanford University

  2. Pande Group Dr. Vijay Pande Folding@home Siraj Khaliq Young Min Rhee Michael Shirts Chris Snow Eric Sorin Bojan Zagrovic Sidney Elmer Genome@home Stefan Larson Vishal Vaidyanathan Amit Garg Guha Jayachandran Collaborators Adam Beberg (Mithral) Dr. Jed Pitera (IBM) Dr. Bill Swope (IBM) Dr. Jay Ponder (Wash U) Folding@home users Dr. John Desjarlais (Xencor) Jeremy England (Harvard) Genome@home users Credits

  3. Molecular simulations in computational biology

  4. Common challenges of Computational Biology • Problems related to folding • Structure prediction • Binding • Protein-protein interaction • Issues: • Models • Force fields (e.g. Charmm, Amber) • Lots of parameters, constrained by experiment: good enough? • Sampling • Can simulate 1ns = 10-9 sec in a day • Need to sample 104 to 106 ns!

  5. Why simulate? • Physics  chemistry  biology • Start from the laws of physics and chemistry, explain the properties of biomolecules • Experiments: less detailed • Spectroscopies, FRET, NMR, etc. • Crystals are static • Simulations: very detailed • Femtosecond time resolution • Angstrom spatial resolution • Much like having thousands of completely detailed single molecule experiments

  6. Goals • Can we characterize folding computationally? • Accurate rates • Detailed mechanisms • Can we design proteins? • Specific stable structure • Retention of function

  7. Challenges of simulation Sampling (tractability) Analysis (insight) Models (force fields)

  8. Simulating protein folding

  9. The Challenges of Protein Folding Simulation • How can we overcome the long timescales? • Fastest proteins in 10’s to 100’s ms • Simulations orders of magnitude shorter • Are force fields good enough? • Would we reach the native state (w/o NS info)? • Would we quantitatively predict folding rates, DG, etc under experimental conditions (30C)? • Can we use simulation to learn about folding? • By what mechanism do they fold? • Do we agree with any folding theories?

  10. Relevant timescales Bond vibration Isomeris- ation Water dynamics Helix forms Fastest folders typical folders slow folders 10-15 femto 10-12 pico 10-9 nano 10-6 micro 10-3 milli 100 seconds MD step long MD run where we need to be where we’d love to be • 16 order of magnitude range • Femtosecond timesteps • Need to simulate micro to milliseconds

  11. Traditional parallel MD:Few, long trajectories • Divide the force calculations between processors • Spatial decomposition for work division • Requires fast communication T3E supercomputer IBM Blue Gene Duan and Kollman, Science (1998) Problem: we need WAY more time than is available at current supercomputer centers

  12. Our method:Many, short trajectories • Advantages of exponential kinetics: • Number that fold in time t: M f(t) = M[1–exp(-kt)] ~ Mkt for small kt M ~ 10,000 procs, k ~ 1/10,000ns, t ~ 20ns/proc expect Mkt ~ 20 simulations to fold • Computationally economical • Doesn’t waste resources on communication • Natural for large, heterogeneous clusters • Important for folding • Heterogeneity of paths, statistics • ergodicity

  13. http://folding.stanford.edu

  14. Distributed computing The server sends and receives the work units (essentially just protein structures and sequences). It verifies, collates and stores the returned data, completes initial analyses, and computes user statistics for the website. The client uses the spare CPU cycles on a user’s computer to run the simulation algorithm on the assigned structure. Results are automatically returned and exchanged for a new work unit on a daily basis. home… lab/office… anywhere

  15. Worldwide distributed computing

  16. Protein folding results

  17. What to fold?…fastest folders 105 60 104 10 CPU years 103 Nanoseconds, CPU-days 1 102 10 1 alpha helix beta hairpin PPA BBA5 villin

  18. Rates: predicted vs experiment Experiments: villin: Raleigh, et al, SUNY, Stony Brook BBAW: Gruebele, et al, UIUC beta hairpin: Eaton, et al, NIH alpha helix: Eaton, et al, NIH PPA: Gruebele, et al, UIUC 100000 villin BBAW 10000 beta hairpin 1000 Predicted folding time (nanoseconds) 100 alpha helix 10 PPA 1 1 10 100 1000 10000 100000 experimental measurement (nanoseconds)

  19. Mechanism: How did these proteins fold? • Form secondary structure first • Form helices & hairpins • Hierarchical, decrease in entropy • Collapse first • Hydrophobically driven • Need to remove water to form hydrogen bonds • Form rough native shape first • Need to find the right “topology” first • Then pack side chains

  20. What have we learned? • Can tackle sampling today • Forcefields sufficient?  Folding to the native state  folding rate prediction • Role of water • Explicit solvent not crucial to rate determination? • Compare to explicit solvent simulation • Universal mechanism of folding? • Maybe no universal mechanism: all proteins could be different?

  21. Protein design

  22. Exploring sequence space: large scale protein design Stanford University Stefan Larson Amit Garg Guha Jayachandran Dr. Vijay Pande Harvard University Jeremy England Xencor, Inc. Dr. John Desjarlais gah.stanford.edu

  23. Utility of large sequence libraries • Directed evolution • constrain and guide mutagenesis steps • enrich starting material in “structured” sequences. • Homology modeling • broader sequence database for finding homologues • generate sequence profiles for alignments, etc. • Drug design • In silico screening of peptide and peptide-mimetic ligands to reduce lead libraries for drug design.

  24. Computational exploration of sequence space • Approach • Detailed all-atom protein representations • Standard molecular mechanics force-fields • Generate large sequence libraries • Apply results to relevant biomedical questions • Challenges • modeling backbone flexibility • generating sequence diversity • large scale iteration of design process

  25. Wollacott AM, Desjarlais JR. “Virtual interaction profiles of proteins.” J Mol Biol. 2001, 313(2):317-42. Raha K, Wollacott AM, Italia MJ, Desjarlais JR. “Prediction of amino acid sequence from structure.” Protein Sci. 2000, 9(6):1106-19. Johnson EC, Lazar GA, Desjarlais JR, Handel TM. “Solution structure and dynamics of a designed hydrophobic core variant of ubiquitin.” Structure Fold Des. 1999, 7(8):967-76. Desjarlais JR, Handel TM. “Side-chain and backbone flexibility in protein core design.” J Mol Biol. 1999, 290(1):305-18. Lazar GA, Desjarlais JR, Handel TM. “De novo design of the hydrophobic core of ubiquitin.” Protein Sci. 1997, 6(6):1167-78. Desjarlais JR, Handel TM. “De novo design of the hydrophobic cores of proteins.” Protein Sci. 1995, 4(10):2006-18. Sequence prediction algorithm • Energy function • Amber/OPLS parameters • implicit solvation • Sampling • genetic algorithm • structure-dependent rotamer space

  26. Structural ensembles Increased sequence diversity Decreased identity to native sequence

  27. Large scale sequence generation Diversity study:

  28. Sequence quality

  29. Designability

  30. Ongoing work • Characterization of sequence space • Natural sequence diversity (SH3) • Homology modeling database • SH3 peptide ligand design New directions • Experimental validation of designed sequences • Hybrid approaches to protein design • Design of peptide-mimetic ligands • Design of functional proteins • New design algorithms and parameter sets

More Related