1 / 51

What is comparative modeling and why is it useful? Steps in CM (overview + some details)

Comparative Protein Structure Modeling Lecture 4.1. Roberto Sanchez Structural Biology Program, Mount Sinai School of Medicine New York, NY 10029, USA. roberto.sanchez@physbio.mssm.edu http://physbio.mssm.edu/~sanchez/. What is comparative modeling and why is it useful?

chava
Download Presentation

What is comparative modeling and why is it useful? Steps in CM (overview + some details)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Comparative Protein Structure ModelingLecture 4.1 Roberto Sanchez Structural Biology Program, Mount Sinai School of Medicine New York, NY 10029, USA roberto.sanchez@physbio.mssm.edu http://physbio.mssm.edu/~sanchez/ • What is comparative modeling and why is it useful? • Steps in CM (overview + some details) • Accuracy of comparative models • Loop modeling • CM and Structural Genomics

  2. Sequence Structure GFCHIKAYTRLIM… Function via Structure Function

  3. Why is it useful to know the structure of a protein not only its sequence? • The biochemical function (activity) of a protein is defined by its interaction with other molecules. • The biological function is in large part a consequence of these interactions. • The 3D structure is more informative than sequence because interactions are determined by residues that are close in space but are frequently distant in sequence. In addition, since evolution tends to conserve function and function depends more directly on structure than on sequence, structure is more conserved in evolution than sequence. The net result is that patterns in space are frequently more recognizablethan patterns in sequence.

  4. Why Protein Structure Prediction? Known Sequences (5/30/01) : 694,000 Known Structures (5/29/01) : 15,200 We know the experimental 3D structure for less than 3% of the protein sequences. For the remaining 97% we need some sort of 3D structure prediction.

  5. …SDVIFTEDGILICNRK… What is Comparative Protein Structure Modeling? Protein Structure Prediction

  6. GFCHIKAYTRLIMVG… Anacystis nidulans Anabaena 7120 Condrus crispus Desulfovibrio vulgaris Folding Evolution Principles of Protein Structure Ab initio prediction Fold Recognition Comparative Modeling

  7. START TARGET TEMPLATE ASILPKRLFGNCEQTSDEGLKIERTPLVPHISAQNVCLKIDDVPERLIPERASFQWMNDK Template Search Target – Template Alignment ASILPKRLFGNCEQTSDEGLKIERTPLVPHISAQNVCLKIDDVPERLIPE MSVIPKRLYGNCEQTSEEAIRIEDSPIV---TADLVCLKIDEIPERLVGE Model Building Model Evaluation No OK? Yes END Steps in Comparative Protein Structure Modeling A. Šali, Curr. Opin. Biotech. 6, 437, 1995. R. Sánchez & A. Šali, Curr. Opin. Str. Biol. 7, 206, 1997. M. Marti et al. Ann. Rev. Biophys. Biomolec. Struct., 29, 291, 2000.

  8. Template Search Methods • Sequence similarity searches(BLAST, FastA) • Profile and iterative methods(HMMs, PSI-BLAST) • Structure based threading(THREADER, PROFIT)

  9. Target – Template Alignment Methods • Dynamic Programming Pairwise Alignments • Multiple Alignments, Profiles, HMMs • Structure based approaches (Threading)

  10. Model Building Methods • Rigid Body Assembly(COMPOSER) • Segment Matching(SEGMOD) • Satisfaction of Spatial Restraints(MODELLER) A. Šali, Curr. Opin. Biotech. 6, 437, 1995 R. Sánchez & A. Šali, Curr. Opin. Str. Biol. 7, 206, 1997

  11. 3D GKITFYERGFQGHCYESDC-NLQP SEQ GKITFYERG---RCYESDCPNLQP F(R) = Ppi(fi/I) EXTRACT Spatial Restraints i SATISFY Spatial Restraints Comparative Modeling by MODELLER http://guitar.rockefeller.edu/modeller/ A. Šali & T. Blundell, J. Mol. Biol. 234, 779, 1993

  12. Model Evaluation methods • Stereochemistry(PROCHECK) • Environment(Profiles3D) • Statistical potentials based methods(PROSAII)

  13. Model Evaluation: Alignment Errors R. Sánchez & A. Šali, Proteins, Suppl. 1, 50-58, 1997

  14. Are models useful if they are just copies of the template?

  15. Huang et al. J. Clin. Immunol. 18,169,1998. Matsumoto et al. J.Biol.Chem. 270,19524,1995. Šali et al. J. Biol. Chem. 268, 9023, 1993. Native mMCP-7 at pH=5 (His+) Native mMCP-7 at pH=7 (His0) Predicting features of a model that are not present in the template Do mast cell proteases bind proteoglycans? Where? When? • mMCPs bind negatively charged proteoglycans through electrostatic interactions? • Comparative models used to find clusters of positively charged surface residues. • Tested by site-directed mutagenesis..

  16. Model Accuracy

  17. Incorrecttemplate Misalignment Distortion in correctly aligned regions MODEL X-RAY TEMPLATE Region without a template Sidechain packing Typical Errors in Comparative Models

  18. CASP: Lessons from Blind Predictions Build models for proteins of unknown structure. Structures are determined after the models are submitted. Models are evaluated by comparing them with the corresponding experimental structures.

  19. CASP: Lessons from Blind PredictionsMultiple Template Models • Comparative modeling (by MODELLER) can combine the best regions from each template. • The per-residue accuracy of comparative models can not be higher than that of any of the templates. • The overall accuracy of models can be higher than that of any of the templates.

  20. CASP: Lessons from Blind Predictions (DFR) R. Sánchez & A. Šali, Proteins, Suppl. 1, 50-58, 1997

  21. Model Accuracy as a Function of Target-Template Sequence Identity

  22. 25% sequence identity 24% sequence identity YGL203C 1ac5 YJL001W 1rypH His 488 Ser 176 Asp 383 Some Models Can Be Surprisingly Accurate (in Some Regions)

  23. Applications of Comparative Models

  24. a+b barrel: flavodoxin IG fold: immunoglobulin antiparallel b-barrel Loop Modeling in Protein Structures A. Fiser, R. Do & A. Šali, Prot. Sci.,9, 1753, 2000

  25. Loop modeling strategies Database search Conformational search • database is complete only up to 4-6 residues • even in DB search, the different conformations must be ranked • loops longer than 4 residues need extensive optimization • DB method is efficient for specific families (eg. Canonical loops in Ig’s, • b- hairpins etc)

  26. Loop Modeling by Conformational Search • Protein representation. • Energy (scoring) function. • Optimization algorithm.

  27. Energy Function for Loop Modeling The energy function is a sum of many terms: 1) Statistical preferences for dihedral angles: 2) Restraints from the CHARMM-22 force field: 3) Statistical potential for non-bonded contacts:

  28. Mainchain Terms for Loop Modeling

  29. Optimization of Objective Function

  30. Calculating an Ensemble of Loop Models

  31. Accuracy of loop models

  32. Assessing Accuracy of Loop Models

  33. RMSD=2.8Å RMSD=0.6Å RMSD=1.1Å HIGH ACCURACY (<1Å) 50% (30%) of 8-residue loops MEDIUM ACCURACY (<2Å) 40% (48%) of 8-residue loops LOW ACCURACY (>2Å) 10% (22%) of 8-residue loops Accuracy of Loop Modeling A. Fiser, R. Do & A. Šali, Prot. Sci., 9, 1537, 2000

  34. Fraction of Loops Modeled With at Least Medium Accuracy

  35. Problems in Practical Loop Modeling • Decide which regions to model as loops. • Correct alignment of anchor regions & environment. • Modeling of a loop. T0076: 46-53 RMSDmnch loop = 1.37 Å RMSDmnch anchors = 1.52 Å T0058: 80-85 RMSDmnch loop = 1.09 Å RMSDmnch anchors = 0.29 Å

  36. How can Comparative Modeling be used in Structural Genomics?

  37. Structural Genomics • Definition:The aim of structural genomics is to put every protein sequence within a modeling distance of a known protein structure. • Size of the problem: • There are a few thousand domain fold families. • There are ~20,000 sequence families (30% sequence id). • Solution: • Determine many protein structures. • Increase modeling distance. Šali. Nat. Struct. Biol. 5, 1029, 1998. Burley et al. Nat. Genet. 23, 151, 1999. Šali & Kuriyan. TIBS22, M20, 1999. Sanchez et al. Nat. Str. Biol. 7, 986, 2000

  38. Target Selection How many structures need to be solved? Which structures should we solve first? How can Comparative Modeling be used in Structural Genomics? • Target Amplification How much of the sequence space is covered by: • a new structure • all structures

  39. Target Selection for Structural Genomics Select targets such that every protein sequence is withina modeling distanceof a known protein structure. Modeling distance: correct alignment, corresponding to >30% sequence identity. G. Kurban, R. Sánchez, A. Šali, T. Gaasterland.

  40. Leveraging Templates by Comparative ModelingQuantifying Productivity of Structural Genomics http://www.nysgrc.org Models are in MODBASE at http://guitar.rockefeller.edu/modbase/

  41. 1 For each sequence For each template 1 END MODPIPE: Large-Scale Comparative Protein Structure Modeling START Prepare PSI-BLAST PSSM by comparing the sequence against the NR database of sequences Align the matched part of the target sequence with the template structure MODELLER PSI-BLAST Use the sequence PSSM to search against the representative set of PDB chains (F and no-F) Build a model for the target segment by satisfaction of spatial restraints Evaluate the model Use the PDB chain PSSMs to search against the sequence (F and no-F) Select Templates using a permissive E-value cutoff R. Sánchez & A. Šali, Proc. Natl. Acad. Sci. USA95, 13597, 1998 R. Sánchez, F. Melo, N. Mirkovic, A. Šali, in preparation

  42. MODPIPE Model of Yeast Hypothetical Protein YIL073C YIL073C model PDB 1a17 template E-value = 65 Seq. Id. = 20% pG = 0.97 Das et al. EMBO J.17, 1192, 1998 The tetratricopeptide repeat (TPR) is a degenerate 34 aa sequence identified in a variety of proteins, present in tandem arrays, mediates protein-protein interactions. R. Sánchez, F. Melo, N. Mirkovic, A. Šali.

  43. Mycoplasma genitalium MODPIPE Models

  44. Mycoplasma genitalium MODPIPE Models

  45. Factors affecting coverage:PDB growth Fold assignments Reliable models

  46. Organism Statistics Top 10 organism by number of models

  47. Organism Statistics Top 10 organism by number of models

  48. MODBASE R. Sánchez, U. Pieper, N.Mirkovic, P. I. W. de Bakker, E. Wittenstein, and A. Šali. Nucl. Acids Res., 28, 250. 2000 R. Sánchez and A. Šali. Bioinformatics, 15, 1060, 1999

More Related