1 / 64

Classification: understanding the diversity and principles of

Classification: understanding the diversity and principles of. protein structure and function. MCSG 2001 structures. Protein structure classification. Main reference: Robert B. Russell (2002) Classification of Protein Folds. Molecular Biotechnology 20:17-28.

gaius
Download Presentation

Classification: understanding the diversity and principles of

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Classification: understanding the diversity and principles of protein structure and function MCSG 2001 structures

  2. Protein structure classification • Main reference: Robert B. Russell (2002) Classification of Protein Folds. Molecular Biotechnology 20:17-28. • Importance: central to studies of protein structure, function, and evolution • Philosophy: phyletic vs. phenetic • Method: structure comparison + human knowledge

  3. Philosophy of classification • Phyletic: based on phylogenetic relationship • Phenetic: based on study of phenomena (phenomelogical)

  4. Classification Unit: Domain, a LEGO piece Ranganathan

  5. From domain to assembly • Domains are shuffled, duplicated and fused to make proteins • On average, a domain is of 173 a.a. in size, compared to 466 a.a. for a yeast protein • Most of the natural domain sequences assume one of a few thousand folds, of which ~1000 are already known • no satisfactory estimate yet for the number of macromolecular complexes • On average, a yeast complex may consist of 7.5 proteins Sali et al. 2003

  6. Distribution of Protein size Swiss-prot

  7. Structural vs. functional domain

  8. Russian doll: a conceptual problem Singh

  9. Approaches • Hierarchical • Based on the types and arrangements of secondary structures • Unit (level): domain • Domain assignment - structural vs. functional (fold or function in isolation) - automated assignment methods (structure vs. sequence)

  10. A. P. Singh

  11. Assignment of Class • All a or All b (could be subjective) • a / b (bab unit) or a + b • Other classes

  12. Class assignment could be subjective

  13. All-alpha structures

  14. All-beta structures Superoxide dimutase

  15. Alpha/beta structures Open twisted sheet Closed barrel

  16. B-a-b motif (barrel) (sheet)

  17. a/b vs. a+b

  18. Assignment of Fold • Defined by the number, type, and arrangement of SSEs • Connectivity (e.g. circular permutation, scrambled proteins)

  19. Assignment of Superfamily • Homologous even in the absence of significant sequence similarity - certain level of structural similarity - unusual structural features - low but significant sequence similarity from structural alignment - key active site residues - sequence similarity bridges • Divergence vs. convergence

  20. Divergent vs. convergent evolution • Divergent evolution: decent from a common ancestor; become variant due to mutation • Convergent evolution: no common ancestor; become similar due to functional or physical constraint

  21. Anti-freeze protein: convergent evolution crystal.biochem.queensu.ca

  22. Homologous fold Ranganathan

  23. Analogous fold Ranganathan

  24. C’ C N N’ C N C’ N’ Analogous or homologous? Scallop Myosin Regulatory Domain C chain Aldehyde Oxidoreductase A chain

  25. Assignment of Family • significant sequence similarity

  26. Classification databases • SCOP - careful assignment of evolutionary relationships; homologous vs. analogous • CATH - A:architecture • FSSP - a list of structural neighbors

  27. CATH Class: SSE composition & packing Architecture: overall shape of domain, ignore SSE connectivity Topology (Fold): consider connectivity Homologous superfamily: a common ancestor Singh

  28. Classification databases

  29. Genome-scale structure analysis Curr. Opin. Str. Biol., 2003

  30. genome-scale structure annotation

  31. Some statistics • 80% of sequence families belong to 400 folds (top 10 folds account for 40% of sequence families) • >60% of genes encode multi-domain proteins (80% for eukaryotes) • ~50,000 protein families and ~150,000 singletons • structural superfamilies ~1800 (+/-50) and ~10,000 unifolds • 50-60% of distant homologs (<25% seq. id.) can be recognized by profile-based sequence comparison methods (e.g. psi-blast, HMM, etc) • 50-60% of the enzymes in yeast and E coli are common, and >80% of pathways are shared

  32. superfolds, superfamilies, supersites • TIM barrel, Rossmann-like, ferredoxin-like, b-propellers, 4-helix bundle, Ig-like, b-jelly rolls, Oligonucleotide/oligosaccharride binding (OB) fold, SH3-like. • Structure -> function (only 50% correct)

  33. Structure implicates function?

  34. Assessing the Progress of Structural Genomics Projects 1 Nov. 2002, Science

  35. Target Tracking by PDB (Sep 2002)

  36. PDB content growth (May 2005)

  37. Some statistics • Contributed 316 non-redundant PDB entries comprising 459 CATH and 393 SCOP domains by 11 SG consortia. • 14% of the targets have a homolog (>30% sequence identity) solved by another consortium • 67% of SG domains in CATH are unique vs. 21% of non-SG domains. • 19% and 11% contributed new superfamilies and new folds, respectively. • Allow new and reliable homology models for 9287 non-redundant gene sequences in 208 completely sequenced genomes.

  38. PSI Structure Statistics2002-2003 • Unique structures (30% seq.ID) PSI 70% PDB 10% • New folds PSI 12% PDB 3% NIGMS Protein Structure Initiative

  39. Average total cost per structure PSI Pilot phase 01 $650 K (7 centers) 02 $400 K (9 centers) 03 $240 K 04 ? 05 $100 K (goal) PSI-2 Production phase 06-10 $50 K (goal) Comparison ~$250-300 K NIGMS Protein Structure Initiative

  40. PSI Pilot Phase -- Lessons Learned • Structural genomics pipelines can be constructed and scaled-up • High throughput operation works for many proteins • Genomic approach works for structures • Bottlenecks remain for some proteins • A coordinated, 5-year target selection policy must be developed • Homology modeling methods need improvement NIGMS Protein Structure Initiative

  41. PSI-2 Production Phase (2005) • Interacting network for high throughput protein structure determination with three components • Large-scale centers for protein structure production of selected targets • Specialized centers for technology development leading to high throughput structure determination of difficult proteins • Specialized centers for protein structures relevant to disease (other NIH Institutes and Centers) • Included in NIH Structural Biology Roadmapplans NIGMS Protein Structure Initiative

  42. Computational structural genomics

  43. Summary table

  44. Fold occurrence matrix

  45. Common Folds

  46. Unique Folds

More Related