1 / 32

A Global View of the Protein Structure Universe and Protein Evolution

A Global View of the Protein Structure Universe and Protein Evolution. Sung-Hou Kim University of California, Berkeley, CA U.S.A. June 27, 2006. Topics. Global view of the protein structure universe II. Mapping of protein functions on the structural universe

schaffer
Download Presentation

A Global View of the Protein Structure Universe and Protein Evolution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Global View of the Protein Structure Universe and Protein Evolution Sung-Hou Kim University of California, Berkeley, CA U.S.A. June 27, 2006

  2. Topics • Global view of the protein structure universe II. Mapping of protein functions on the structural universe III. Global view of the evolution of proteins

  3. J. Hou G. Sims I.-G. Choi S.-R. Jun C. Zhang

  4. I. Mapping the Protein Structure Universe: Structural Demography

  5. The Protein Universe • 500 – 20,000 genes per organism • >13.6  106 species • >1010 – 1012 protein sequences but……….. • ~105 protein sequence families • ~104 protein structure families • ~103 protein fold domain families

  6. “Mapping” by Metric Matrix Distance Geometry(Classical Multidimensional Scaling) Most likely (consistent) global relational “mapping” Pair-wise relational distances with “errors” x1 d1,4 x4 d1,2 d1,3 d3,4 d2,4 d2,3 x2 x3

  7. Method • Take all protein structures in PDB (>35,000) • Construct a non-redundant set at 25% sequence identity (~2000 structures) • Calculate all-to-all pair-wise structural similarities, then convert to dissimilarity scores • Apply metric matrix distance geometry to find the global position of each structure in N-dimensional space • 3-D plot to capture the major features of the protein structure space

  8. Protein Structure Distance Matrix (~2000 structures with <25% sequence ID) P1 P2 P3 P4 P5 P6 ……………P1898 P1 P2 P3 P4 P5 P6 . . P1898 D 3,4

  9. Eigen values Positional coordinates in 1898 dimensional space. Major feature extraction in 3-dimension

  10. The Protein Structure Universe (2005)

  11. Four demographic regions of the protein structure universe A1: (2ERL:_) MATING PHEROMONE ER-1; A2: (1ELW:B) TPR1-DOMAIN OF HOP; A3: (1A6M:_) MYOGLOBIN; A4: (1E85:A) CYTOCHROME C’; A5: (1M57:C) CYTOCHROME C OXIDASE; A2 A1 A3 A4 A5

  12. Four Protein Fold Classesa b a+b a/b + n n n n m

  13. Major Features of the Protein Structural Space • Protein structural space is sparsely populated • Four elongated regions corresponding to four protein “fold” classes • Small to large size distribution along three of four “feature axes”

  14. II. Mapping of Functions(1) Enzymatic functions

  15. EC Molecular functions: Basic chemistry

  16. EC3: Hydrolases

  17. EC6: Ligases

  18. II. Mapping of Functions(2) Metal Binding

  19. Ca Co Cu Fe Mn Mo Ni Zn Multi-bound Not bound Metal Binding

  20. Zn

  21. Cu

  22. Major Features of Functional Mapping Maximum diversity in architectural preference for a given molecular function: “scaffold” selection vs. design

  23. III. Evolution of Proteins (a) “Ages” of Protein Families

  24. Method: “Common Structural Ancestor”

  25. The “age” of the “common structural ancestor” of a protein family “Age” of CSA

  26. Ages of the Common Structural Ancestors Population averaged Chain length has similar distribution

  27. III. Evolution of Proteins (b) Protein Fold Classes

  28. ML Relative “age” of common structural ancestors

  29. III. Evolution of Proteins (e) Protein Families

  30. Hypothesis: Multiple Origins of Protein Families

  31. Summary • Mapping of protein structures— Sparse except four highly populated demographic regions (structural selection) • Mapping of molecular functions— Opportunistic use of structural features for molecular function (selection, not design) • Mapping of CSA ages— (1) Evolution of protein fold classes (2)”Multiple origin model” for the evolution of protein families

  32. Organismic evolution by natural selection for environment may be founded on Molecular evolution by structural selection for function

More Related