320 likes | 497 Views
A Global View of the Protein Structure Universe and Protein Evolution. Sung-Hou Kim University of California, Berkeley, CA U.S.A. June 27, 2006. Topics. Global view of the protein structure universe II. Mapping of protein functions on the structural universe
E N D
A Global View of the Protein Structure Universe and Protein Evolution Sung-Hou Kim University of California, Berkeley, CA U.S.A. June 27, 2006
Topics • Global view of the protein structure universe II. Mapping of protein functions on the structural universe III. Global view of the evolution of proteins
J. Hou G. Sims I.-G. Choi S.-R. Jun C. Zhang
I. Mapping the Protein Structure Universe: Structural Demography
The Protein Universe • 500 – 20,000 genes per organism • >13.6 106 species • >1010 – 1012 protein sequences but……….. • ~105 protein sequence families • ~104 protein structure families • ~103 protein fold domain families
“Mapping” by Metric Matrix Distance Geometry(Classical Multidimensional Scaling) Most likely (consistent) global relational “mapping” Pair-wise relational distances with “errors” x1 d1,4 x4 d1,2 d1,3 d3,4 d2,4 d2,3 x2 x3
Method • Take all protein structures in PDB (>35,000) • Construct a non-redundant set at 25% sequence identity (~2000 structures) • Calculate all-to-all pair-wise structural similarities, then convert to dissimilarity scores • Apply metric matrix distance geometry to find the global position of each structure in N-dimensional space • 3-D plot to capture the major features of the protein structure space
Protein Structure Distance Matrix (~2000 structures with <25% sequence ID) P1 P2 P3 P4 P5 P6 ……………P1898 P1 P2 P3 P4 P5 P6 . . P1898 D 3,4
Eigen values Positional coordinates in 1898 dimensional space. Major feature extraction in 3-dimension
Four demographic regions of the protein structure universe A1: (2ERL:_) MATING PHEROMONE ER-1; A2: (1ELW:B) TPR1-DOMAIN OF HOP; A3: (1A6M:_) MYOGLOBIN; A4: (1E85:A) CYTOCHROME C’; A5: (1M57:C) CYTOCHROME C OXIDASE; A2 A1 A3 A4 A5
Four Protein Fold Classesa b a+b a/b + n n n n m
Major Features of the Protein Structural Space • Protein structural space is sparsely populated • Four elongated regions corresponding to four protein “fold” classes • Small to large size distribution along three of four “feature axes”
EC Molecular functions: Basic chemistry
Ca Co Cu Fe Mn Mo Ni Zn Multi-bound Not bound Metal Binding
Major Features of Functional Mapping Maximum diversity in architectural preference for a given molecular function: “scaffold” selection vs. design
The “age” of the “common structural ancestor” of a protein family “Age” of CSA
Ages of the Common Structural Ancestors Population averaged Chain length has similar distribution
ML Relative “age” of common structural ancestors
Summary • Mapping of protein structures— Sparse except four highly populated demographic regions (structural selection) • Mapping of molecular functions— Opportunistic use of structural features for molecular function (selection, not design) • Mapping of CSA ages— (1) Evolution of protein fold classes (2)”Multiple origin model” for the evolution of protein families
Organismic evolution by natural selection for environment may be founded on Molecular evolution by structural selection for function