290 likes | 423 Views
Comparing the Complete S. cerevisiae and C. elegans Proteomes: Orthology and Divergence. Stephen A. Chervitz Saccharomces Genome Database NCBI Boston University. Goals of this study. Explore protein sequence and domain conservation between S. cerevisiae and C. elegans .
E N D
Comparing the Complete S. cerevisiae and C. elegans Proteomes: Orthology and Divergence Stephen A. Chervitz Saccharomces Genome Database NCBI Boston University
Goals of this study • Explore protein sequence and domain conservation between S. cerevisiae and C. elegans. • Unicellular vs. multicellular lifestyles • Classify yeast and worm similarity groups using functional annotation of yeast genes. • Enhance the SGD website and add value to the worm genomic sequence.
Organization of this study • Shared core biology • Whole protein sequence comparisons • Divergence • Protein domain comparisions • No gene predictions • No mitochondrial sequence
Definitions • Orthologs: Genes from different species that perform the same biological function and are likely to be evolved from a common ancestral gene. • Paralogs: Genes that perform different biological functions in the same species that likely arose by duplication and divergence from a common ancestral gene.
Genome Scorecards X20,000 x200 Saccharomyces cerevisiae Caenorhabditis elegans No. of cells: 1 ~1000 Size (Mbp): 12 97 Chromosomes: 16 6 Predicted ORFs: 6,217 19,099 Percent coding: 72% 27% ORFs with gene names: 3,344 (53%) 688 (4%)
Building a Biological Rosetta Stone Worm orthologs with functional description Yeast ORFs with functional description P-Value 1e-10 86% 64% 1e-20 89% 69% 1e-40 93% 61% 1e-60 96% 74% 1e-80 96% 74% 1e-100 98% 77% 1e-200 98% 88%
Distribution of core biological functions conserved in both yeast and worm
Core Biological Functions • Signal Transduction: kinases, phosphatases, Ras superfamily and other GTP-binding proteins,GDP/GTP exchange factors, ADP-ribosylation factors, adenylyl/guanylyl cyclases, phosphatidylinositol kinases, EF-hand proteins • DNA/RNA Metabolism:polymerases, helicases, topoisomerases, repair/recombination-related, nucleases, primases, splicing factors, initiation/elongation factors (transcription & translation), tRNA synthetases, histone acetylases/deacetylases • Transport & Secretion:ABC transporters, permeases, vesicle coat & fusion proteins, clatherin-accociated, protein targeting, signal recognition particle, nuclear pore-associated • Cytoskeletal:Actin, myosin, tubulin, actin-related proteins, actin-interacting proteins, septins, cytokinesis-related proteins
Core Biological Functions (cont’d) • Ribosomal:ribosomal proteins (small & large subunit), ribosome processing proteins • Protein Folding and Degradation:heat shock proteins, chaperonins, proteasome subunits, ubiquitin-related, peptidyl prolyl cis-trans isomerase, protein disulfide isomerases, aminopeptidases, post-translational modifying enzymes (farnesyltransferase, myristoyltransferase, glycosylation, GPI-anchoring) • Intermediary Metabolism: dehydrogenases, reductases, mutases, lyases, isomerases, carboxylases, decarboxylases, nucleotide biosynthetic enzymes, transaminases, deaminases, epimerases, oxygenases, cytochromes, flavoproteins
Domain Analysis • 122 common eukaryotic protein domains. • Associated with regulation of gene expression and signal transduction. • Compare occurrence and domain architectures in yeast and worm protein sequences. • Position-dependent weight matrices (profiles) to detect domains (PSI-BLAST). • Classify worm-only, yeast-only, and shared domains.
Worm-Only Domains • Nuclear hormone receptors • Epidermal growth factor • Degenerins • FMRFamides (neuropeptides) • Cadherin • PTB (phosphotyrosine binding) • T-box, SMAD (transcription factor domains) • Insulin-like peptides • Laminin NT
Yeast-Only Domains • C6 (Zn-binding cluster) • ASPES (DNA-binding)
Protein kinase catalytic C2H2 Finger AAA ATPase DAG Kinase Arrestin Ankyrin SWI/SNF helicase RING-finger bHLH RHO GAP/GEF Plecstrin homology SH3 Ubiquitin SH2 cNMP-signaling domains CaM EF-hands Homeodomains Potassium channels 7TM receptors HINT Immunoglobulin LRR vWA MATH POZ LIM Shared Domains (Yeast & Worm)
Frequency of occurrence of common domains Domain counts are normalized to the number of proteins with a given domain per 1000 genes.
Conclusions • Core biological functions are carried out by orthologous proteins occurring in comparable numbers in yeast and worm. • These represent approx. 40% of the predicted yeast ORFs and 20% of the predicted worm ORFs. • Regulatory and signaling proteins in worm do not have orthologs in yeast but often share domains. • Complete results are available online at SGD at http://genome-www.stanford.edu/Saccharomyces/worm
Future Directions • Incorporate more sensitive sequence search results. • More sophisticated clustering scheme. • Multi-domain proteins and weak similarities. • Up-to-date with to changes in the genomic datasets. • Add/remove protein coding regions • Correction of errors in the genomic sequence • Sequence name changes • Extended annotation support. • Controlled vocabularies, gene function ontologies. • Comparative genomics framework for additional genomes. • More flexible browsing of genome-wide similarities. • Prototype yeast genome protein similarity Java viewer
Genome-wide protein similarity view • Explore protein sequence similarities within or between genomes • Graphical user interface • Available at SGD for the yeast genome • Sequence Resources, Protein Similarity View
Acknowledgements • Saccharomyces Genome Database (Stanford) • Gavin Sherlock • Cathy Ball • Selina Dwight • Midori Harris • Kara Dolinski • Shuai Weng • Eric Hester • Mike Cherry • David Botstein
Acknowledgements (cont’d) • NCBI (Nat’l Library of Medicine) • L. Aravind • Eugene Koonin • Boston University • Scott Mohr • James Freeman • Temple Smith • Neomorphic Software (Berkeley) • www.neomorphic.com
Single-linkage clustering and multi-domain proteins “Chaining” 1. 2. 3.
Whole genomic DNA microarrayDeRisi et al.(1997) Science 278: 680