1 / 54

Prediction of Protein Structure and Function on a Proteomic Scale

Prediction of Protein Structure and Function on a Proteomic Scale. Jeff Skolnick Director Center of Excellence in Bioinformatics. General Approach. Prediction of Protein Structure. Overview of CASP5 Results:. Comparative Modeling (CM) Results. T0153 CM.

filia
Download Presentation

Prediction of Protein Structure and Function on a Proteomic Scale

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Prediction of Protein Structure and Function on a Proteomic Scale Jeff Skolnick Director Center of Excellence in Bioinformatics

  2. General Approach

  3. Prediction of Protein Structure

  4. Overview of CASP5 Results:

  5. Comparative Modeling(CM) Results

  6. T0153 CM COORDINATE SUPERPOSITION RMSD = 1.74 Å ( 129 / 134 aa ) NATIVE (discontinuous line) : PREDICTED (continuous line) : 1mq7 A rank #1

  7. Fold Recognition (FR) results

  8. T0135 FR(A) GLOBAL COORDINATE SUPERPOSITION RMSD = 4.80 Å ( 106 / 106 aa ) NATIVE (discontinuous line) : PREDICTED (continuous line) : rank #1

  9. T0135 FR(A) GLOBAL COORDINATE SUPERPOSITION RMSD = 4.80 Å ( 106 / 106 aa ) NATIVE (discontinuous line) : PREDICTED (continuous line) : rank #1 Yellow line: region originally aligned to the template (1h6kX )

  10. New Fold (NF) results

  11. T0181 (NF) PREDICTED: rank #2

  12. How representative is the set of solved PDB structures?

  13. The PDB is a covering set of protein structures at low resolution Results from a new structure alignment program, SAL Kihara & Skolnick, J. Mol. Biol, 2003:333:393-802

  14. Structural alignments to proteins of different secondary structureDifferent CATH ids 100 residue proteins

  15. Use of best structural alignments Can we build good models starting from protein templates with average sequence id of 7%?

  16. TASSER:Threading/ASSEmbly/Refinement

  17. Very large scale structure prediction benchmark

  18. Comprehensive benchmark set of PDB structures Length range: 41~200 Sequence identity cut-off: 35% In total: 1489

  19. Summary of Results

  20. SAL TASSER MODELLER Besta Alignb Top-5c Top-1d Alignb Top-5c Top-1d <RMSD>e 2.510 1.877 2.246 2.352 2.708 3.740 4.318 <COV>f 82% 82% 100% 100% 82% 100% 100% NRMSD<6.0 NRMSD<5.5 NRMSD<5.0 NRMSD<4.5 NRMSD<4.0 NRMSD<3.5 NRMSD<3.0 NRMSD<2.5 NRMSD<2.0 NRMSD<1.5 NRMSD<1.0 1489 1485 1472 1440 1369 1255 1064 776 498 218 46 1489 1489 1489 1489 1488 1476 1422 1250 922 411 83 1487 1485 1481 1468 1447 1396 1259 987 623 253 52 1481 1475 1464 1450 1423 1359 1206 928 582 241 49 1462 1431 1395 1336 1255 1141 1008 750 520 244 37 1326 1266 1195 1116 984 834 647 475 300 124 20 1202 1138 1060 962 841 697 551 397 244 85 15 Summary of Overall Folding Results

  21. Some Examples:

  22. Summary • At low resolution, the PDB is most likely complete for single domain proteins • Can build acceptable full length models in the majority of cases • Can refine the initial structures to move closer to native, even starting from the best structural alignment

  23. Results from threading/refinement “Real Life” situation

  24. TASSER:Threading/ASSEmbly/Refinement

  25. “Easy” Cases: • At least two threading templates identified with significant consensus region or • One template with z-score that is highly significant

  26. “Medium ” Cases: • At least two threading templates identified without any significant consensus region or • One template with z-score above threshold for correct fold assignment

  27. Composite Threading Results • We can identify the correct global fold in 92% of the entire representative set of small PDB structures • Can generate good template alignments in 59% of the cases • Good substructures 67% of the cases

  28. Summary of Results

  29. Thin lines: Native; thick lines: Template/model Examples of Alignment improvement Medium Easy Template Final model Template Final model • Two factors mainly contribute to the improvement: • geometric connectivity • Better packing of local structure and side group because of the force field

  30. Comparison to Ensemble of NMR Structures (Predicted Structure to Centroid/Farthest NMR Structure to Centroid) Thick Line is Predicted Structure

  31. 487 Single-domain proteins 236 two-domain proteins 745 22 three-, four-domain proteins Benchmark set of larger proteins (201-300 residues)

  32. Successful Predictions of Transmembrane Proteins

  33. Application to ORFS <201 residues in E. coli 61% Easy (829/1360) 38% Medium (521/1360) 10 Hard TASSER 68% (920/1360) Good models

  34. Summary • Acceptable model in about 2/3 of the cases (969/1489) • Application to E coli Yields similar results ~2/3 of proteins should have good model -Almost all (90%) have a good template

  35. Development of Active Site Descriptors

  36. Representation of an Automated Functional Template [ AFT ] Types of functional sites from SwissProt: METAL BINDING ACT_SITE SITE cm SCj cm SCi Cai+1 Caj-1 Caj Cai Caj+1 Cai-1 Set of distances between: cm SCk Cak Ca atoms and center of mass of the side chains corresponding to 3 to 5 functional residues, Cak-1 Cak+1 Ca atoms corresponding to the adjacent residues.

  37. Specificity parameters of AFTs 30 28 26 24 22 20 18 16 14 12 10 8 6 4 2 0 Positive hits Negative hits Restrictive cutoff: average value of DRMSDMaxPos and DRMSDMinNeg. Permissive cutoff: expected number of false positive matchs is less than 0.005 in a random structure. Number of hits in the subset of PDB High confidence DRMSD interval Low confidence DRMSD interval DRMSDMaxPos DRMSDMinNeg 0.0 0.5 1.0 1.5 2.0 2.5 DRMSD [ Å ]

  38. Fraction of decoys correctly annotated vs. ranking of the best true positive hit Global Ca crmsd from the native structure Local Ca drmsd from the native structure 73% 56% 48% 35% The recognition by an AFT matching the first three components of the true EC number is considered a true positive hit.

  39. Threading of Entire Genomes

  40. Summary of Fold Assignments

  41. Comments on fold distribution • Protein folds can be assigned to 72-85% of genes in each genome. • 30-50% of the total amino acids in a genome are covered by the assigned folds. • Generally, distribution of folds are similar in the 5 organisms. • Folds of a/b type are abundant. • Folds of multi-functions are abundant in a genome. • Kinase fold shows up in top 5 only in S.cerevisiae.

  42. MULTIPROSPECTOR:Prediction of Protein-Protein Interactions L. Lu, H. Lu, J. Skolnick. Proteins, 2002, 49, 350-364.

  43. X: GELPIAPIGRIIKNA GAERVSDDARIALAK VLEEMGEEIASEAVK LAKHAGRKTIKAEDI KLARKMFK Y: GEVPIAPLGRIIKNA GAERVSDDARIALAK VLEEMGEEIASEAIR LAKHAGRKTIKAEDV KLAKKMFK X: GELPIAPIGRIIKNA GAERVSDDARIALAK VLEEMGEEIASEAVK LAKHAGRKTIKAEDI KLARKMFK Y: GEVPIAPLGRIIKNA GAERVSDDARIALAK VLEEMGEEIASEAIR LAKHAGRKTIKAEDV KLAKKMFK Monomer threading Multimer Threading A B A B X Y Multimer Structure Library A B Assign fold on the basis of Z score and Interface Energy Overall Idea of Multimer Threading

  44. 20 20 58 91 96 54 5 4 Proteins predicted to be dimers Proteins predicted to be monomers Preliminary test on Known Dimers and Monomers Homodimers: 58 Heterodimers: 20 Monomers: 96

  45. Procedure for genomic scale prediction of protein-protein interactions by MULTIPROSPECTOR

  46. Comparison of colocalization index for different methods

  47. Distribution of predicted interactions in functional categories

  48. Conclusions

More Related