1 / 56

Structural Analysis "Big Science at Small Colleges" Curriculum Development Workshop

Structural Analysis "Big Science at Small Colleges" Curriculum Development Workshop. Ranyee Chiang University of California, San Francisco. Outline – Structural Analysis. Key themes Structural visualization tools (hands-on tutorial) Structure prediction

moors
Download Presentation

Structural Analysis "Big Science at Small Colleges" Curriculum Development Workshop

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Structural Analysis"Big Science at Small Colleges" Curriculum Development Workshop Ranyee Chiang University of California, San Francisco

  2. Outline – Structural Analysis • Key themes • Structural visualization tools (hands-on tutorial) • Structure prediction • Structure prediction tools (hands-on activity) • Genetic variation and structures • Hemoglobin exploration (hands-on activity)

  3. Proteins have many functions • Enzymes– catalyze chemical reactions • Transport proteins– transport other molecules across cell membranes and throughout the body • Structural proteins– provide support • Receptors– regulate and coordinate bodily activities

  4. Proteins have characteristic shapes • A protein is made of a chain of amino acids • After the protein is produced, it folds up RGAEEVWWPILG… Arg His Lys Asp Glu Ser Thr Asn Gln Cys Gly Pro Ala Ile Leu Met Phe Trp Tyr Val

  5. sequence  structure  function KNGTIVTADGI… cleavage of a 5-membered cyclic diamide D-hydantoinase

  6. same sequences  same structures • similar sequences  similar structures RGAEEVWWPILG… RGRGEVWWPILG…  Way to deduce function from structure or sequence  Way to deduce structure from sequence

  7. Comparing sequences… RGRGEVWWPILK RGAEEVWWPILG Comparing structures… How to measure protein similarity?

  8. How similar are these two sequences? RGAEEVWWPILGRRKHGPKRLGRRKHGPKR RGATEVRWPILGRRKHGPKRLGRRKHGPKR These sequences have 30 amino acids, 27 are identical  Sequence identity = 90%

  9. How similar are two structures? • RMSD = √(d12+d22+d32) = √(9+16+25) ≈ 7.0 • More similar pairs of structures  lower RMSD GRK GKK K 3.0 R K G 4.0 K 5.0 G

  10. Tutorials • Introduction to structural visualization • http://www.cgl.ucsf.edu/chimera/current/docs/UsersGuide/tutorials/menutut.html OR • http://www.cgl.ucsf.edu/chimera/current/docs/UsersGuide/tutorials/getting_started.html • Structure comparisons • http://www.cgl.ucsf.edu/chimera/current/docs/UsersGuide/tutorials/squalene.html • Sequence and structural alignments • http://www.cgl.ucsf.edu/chimera/current/docs/UsersGuide/tutorials/super.html

  11. Other resources • Fold It - http://fold.it/portal/adobe_main • Solve protein folding puzzles • Enables researchers to collect data about human pattern recognition and problem-solving • Learn elements of protein folding and stability • List of sequence alignments (pre-generated and online servers) - http://www.cgl.ucsf.edu/home/meng/sources.html • List of structure alignments (pre-generated and online servers) – http://www.cgl.ucsf.edu/home/meng/grpmt/structalign.html

  12. Protein Structure Prediction with Emphasis on Comparative or Homology Modeling • Introduction and motivation • Types of protein structure prediction methods • Comparative modeling • Errors in comparative models • Modeling of loops in protein structures • Prediction of errors in comparative models • Structural genomics • Tools

  13. Protein structure provides important information • Knowledge of a protein’s structure helps us • design drugs that target that protein • engineer new functions for that protein • determine its evolutionary relationship to other proteins

  14. 1 1. R. Service. Structural Genomics, Round 2. Science 307, 1554-1558, 2005. Experimental determination of structures is costly How much does it cost to determine the crystal structure of a protein? NIH estimate: $250,000

  15. Why Protein StructurePrediction? We have an experimentally determined atomic structure for only ~1% of the known protein sequences.

  16. GFCHIKAYTRLIMVG… Anacystis nidulans Anabaena 7120 Condrus crispus Desulfovibrio vulgaris Evolution (“statistical” rules) Folding (physics) Threading Comparative Modeling Ab initio prediction Principles of protein structure D. Baker & A. Sali. Science 294, 93, 2001.

  17. The “physics” principle The native structure of a protein is determined by its amino acid sequence, under native conditions (uniqueness, stability, kinetic accessibility). C.B. Anfinsen

  18. The “comparative modeling” principle

  19. Cα RMSD Å (% EQV) 2 (50) 1 (80) 0 (100) Evolution of protein families Families (very similar sequences) 30,000 Anacystis nidulans Anabaena 7120 Superfamilies (similar sequences) 10,000 Condrus crispus Folds (similar 3D structure) 3,000 Desulfovibrio vulgaris ~30% are known Clostridium mp. 20 50 100 % SEQUENCE IDENTITY 10/2/02

  20. COMPARATIVE MODELING Ca RMSD Å (% EQV) 20 50 100 2 (50) 1(80) 0 (100) % SEQUENCE IDENTITY Anacystis nidulans Anabaena 7120 Condrus crispus Desulfovibrio vulgaris Clostridium mp. Comparative Protein Structure Modeling Flavodoxin family KIGIFFSTSTGNTTEVA…

  21. Protein structure modeling Ab initio prediction Comparative Modeling Applicable to any sequence. Not very accurate (>4 A RMSD). Attempted for proteins of <100 residues. Accuracy and applicability are limited by our understanding of the protein folding problem. Applicable to those sequences only that share recognizable similarity to a template structure. Fairly accurate ( <3 A RMSD), typically comparable to a low resolution X-ray experiment. Not limited by size. Accuracy and applicability are limited by the number of known folds.

  22. ASILPKRLFGNCEQTSDEGLKIERTPLVPHISAQNVCLKIDDVPERLIPE Yes TEMPLATE START END Model Evaluation Template Search Model Building Target – Template Alignment MSVIPKRLYGNCEQTSEEAIRIEDSPIV---TADLVCLKIDEIPERLVGE OK? Steps in Comparative Protein Structure Modeling TARGET ASILPKRLFGNCEQTSDEGLKIERTPLVPHISAQNVCLKIDDVPERLIPERASFQWMNDK No M. Marti-Renom et al. Ann. Rev. Biophys. Biomolec. Struct. 29, 291, 2000. N. Eswar et al. Curr. Protocols Bioinformatics 5.6, 2006. http://salilab.org/

  23. 3D GKITFYERGFQGHCYESDC-NLQP… F(R) = Πpi (fi /I) 2. Satisfy spatial restraints 1. Extract spatial restraints SEQ GKITFYERG---RCYESDCPNLQP… i Comparative modeling by satisfaction of spatial restraints MODELLER A. Šali & T. Blundell. J. Mol. Biol. 234, 779, 1993. J.P. Overington & A. Šali. Prot. Sci. 3, 1582, 1994. A. Fiser, R. Do & A. Šali, Prot. Sci., 9, 1753, 2000. http://salilab.org/

  24. 12 Å Extracting spatial restraints from template template GKTIFYERKRD… target GKITFY– RGRF… spatial restraint: limit on structural feature of model ?

  25. 3D GKITFYERGFQGHCYESDC-NLQP… F(R) = Πpi (fi /I) 2. Satisfy spatial restraints 1. Extract spatial restraints SEQ GKITFYERG---RCYESDCPNLQP… i Comparative modeling by satisfaction of spatial restraints MODELLER A. Šali & T. Blundell. J. Mol. Biol. 234, 779, 1993. J.P. Overington & A. Šali. Prot. Sci. 3, 1582, 1994. A. Fiser, R. Do & A. Šali, Prot. Sci., 9, 1753, 2000. http://salilab.org/

  26. Key components of modeling • Representation • Sampling • Scoring 2.35

  27. Some restraints in MODELLER that are useful in comparative modeling Homology-based (from related structures): p(distance / d’,a,g,s,i) p(SDCH / R,S’,R’,t,s) p(MNCH / R,M’,R,s) MM Force-Field (structure-independent): CHARMM-19, 22, α Generalized Born / Surface Area solvation Statistical potentials (from all known structures): p(distance / atom types) p(MNCH / residue type) p(SDCH / residue type) Šali & Blundell. J. Mol. Biol. 234, 779, 1993. Overington & Sali. Prot. Sci. 3, 1582, 1994. Fiser, Go, Sali. Prot. Sci. 9, 1753, 2000. Melo, Sanchez, Sali, Prot. Sci. 11, 430, 2002. M.-Y. Shen, B. Webb M. Karplus et al.

  28. Protein Structure Prediction with Emphasis on Comparative or Homology Modeling • Introduction and motivation • Types of protein structure prediction methods • Comparative modeling • Errors in comparative models • Modeling of loops in protein structures • Prediction of errors in comparative models • Structural genomics • Tools

  29. Model accuracy as a function of target-template sequence identity Fraction of Cα atoms within 3.5Å of their correct positions. R. Sánchez & A. Šali, Proc. Natl. Acad. Sci. USA 95, 13597, 1998.

  30. Distortion/shifts in aligned regions Region without a template Sidechain packing Incorrecttemplate Misalignment MODEL X-RAY TEMPLATE Typical errors in comparative models Marti-Renom et al. Annu.Rev.Biophys.Biomol.Struct. 29, 291-325, 2000.

  31. Protein structure models can be useful, despite errors D. Baker & A. Sali. Science 294, 93, 2001.

  32. Protein Structure Prediction with Emphasis on Comparative or Homology Modeling • Introduction and motivation • Types of protein structure prediction methods • Comparative modeling • Errors in comparative models • Modeling of loops in protein structures • Prediction of errors in comparative models • Structural genomics • Tools

  33. antiparallel β-barrel α+β barrel: flavodoxin IG fold: immunoglobulin Loop Modeling in Protein Structures A. Fiser, R. Do & A. Šali, Prot. Sci.9, 1753, 2000.

  34. Loop modeling strategies Database search Conformational search • even in DB search, the different conformations must be ranked • loops longer than 4 residues need extensive optimization • DB method is efficient for specific families (eg, canonical loops in Ig’s, β−hairpins)

  35. Protein Structure Prediction with Emphasis on Comparative or Homology Modeling • Introduction and motivation • Types of protein structure prediction methods • Comparative modeling • Errors in comparative models • Modeling of loops in protein structures • Prediction of errors in comparative models • Structural genomics • Tools

  36. Model Evaluation Methods • Is the fold correct? • How correct is the overall structure? • What regions are modeled incorrectly? • What is the best model in the set of alternative models? • Does the model satisfy the restraints used to calculate it? • Stereochemistry test (PROCHECK) • Residue environment test (Profiles3D) • Statistical potential tests (PROSAII) • Other statistical tests, including tests with multiple criteria (GA341). • Molecular mechanics force field tests.

  37. The number of “families” is much smaller than the number of proteins. Any one of the members of a family is fine. Structural Genomics Sali. Nat. Struct. Biol. 5, 1029, 1998. Sali et al. Nat. Struct. Biol., 7, 986, 2000. Sali. Nat. Struct. Biol.7, 484, 2001. Baker & Sali. Science 294, 93, 2001. Goal: Characterize most proteinsequencesbased on related knownstructures.

  38. Eswar et al. Nucl. Acids Res. 31, 3375–3380, 2003.

  39. Build models for target segment by satisfaction of spatial restraints Align sequence profile with multiple structure profile using local dynamic programming START END Evaluate models Select templates using permissive E-value cutoff Get profile for sequence (SP/TrEMBL) MODELLER MODPIPE: Automated Large-Scale Comparative Modeling MODELLER For each target sequence R. Sánchez & A. Šali, Proc. Natl. Acad. Sci. USA 95, 13597, 1998. Eswar et al. Nucl. Acids Res. 31, 3375–3380, 2003. Pieper et al., Nucl. Acids Res. 32, 2004. N. Eswar, M. Marti-Renom, M.S. Madhusudhan, B. John, A. Fiser, R. Sánchez, F. Melo, N. Mirkovic, B. Webb, M.-Y. Shen, A. Šali. For each template profile

  40. MODBASE: models for domains in ~1.6 million sequenceshttp://salilab.org/modbase Pieper et al. MODBASE, a database of annotated comparative protein structure models, and associated resources. Nucleic Acids Research, 2006. Search Page Model Details Sequence Overview Model Overview

  41. Seminal papers • Sippl, M. J. Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. J Mol Biol 213, 859-83 (1990). • Ponder JW, Richards FM. Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes.J Mol Biol. 1987 Feb 20;193(4):775-91. • Sali, A. and Blundell, T.L., Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol., 1993. 234: p. 779-815. • A method to identify protein sequences that fold into a known threedimensional structure. J. U. Bowie, R. Luthy, D. Eisenberg. Science 253(5016): 164-70

  42. ModBase Activity • Uploaded document

  43. Other resources • An Interactive NCBI Mini-Course • http://www.ncbi.nlm.nih.gov/Class/minicourses/quickstructure.html • Identify conserved domains • Search for other proteins with conserved domains • Explore modeling template • Find distant homologs

  44. Understanding the impact of human genetic variation is a key challenge Disease predispositions Response to medications R144C Q6V Cytochrome P450 Hemoglobin Sickle cell anemia Warfarin-induced bleeding

  45. Most disease-associated human genetic variants are missense mutants AGTGAC AGTGUC ALFLDVSDQTPINSIIFSHED ALFLDVSVQTPINSIIFSHED

  46. There are many ways that missense mutants can impact protein function • Protein aggregates and does not fold • Protein is destabilized • Binding interfaces are disrupted • Active sites are disrupted

  47. There are many ways that the functional impact of a missense mutant can be assessed • Biochemical experiments • Physics • Epidemiology • Bioinformatics

  48. Predicting functional impact of nsSNP

More Related