1 / 29

Comparing the Complete S. cerevisiae and C. elegans Proteomes: Orthology and Divergence

Comparing the Complete S. cerevisiae and C. elegans Proteomes: Orthology and Divergence. Stephen A. Chervitz Saccharomces Genome Database NCBI Boston University. Goals of this study. Explore protein sequence and domain conservation between S. cerevisiae and C. elegans .

drew
Download Presentation

Comparing the Complete S. cerevisiae and C. elegans Proteomes: Orthology and Divergence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Comparing the Complete S. cerevisiae and C. elegans Proteomes: Orthology and Divergence Stephen A. Chervitz Saccharomces Genome Database NCBI Boston University

  2. Goals of this study • Explore protein sequence and domain conservation between S. cerevisiae and C. elegans. • Unicellular vs. multicellular lifestyles • Classify yeast and worm similarity groups using functional annotation of yeast genes. • Enhance the SGD website and add value to the worm genomic sequence.

  3. Organization of this study • Shared core biology • Whole protein sequence comparisons • Divergence • Protein domain comparisions • No gene predictions • No mitochondrial sequence

  4. Definitions • Orthologs: Genes from different species that perform the same biological function and are likely to be evolved from a common ancestral gene. • Paralogs: Genes that perform different biological functions in the same species that likely arose by duplication and divergence from a common ancestral gene.

  5. Genome Scorecards X20,000 x200 Saccharomyces cerevisiae Caenorhabditis elegans No. of cells: 1 ~1000 Size (Mbp): 12 97 Chromosomes: 16 6 Predicted ORFs: 6,217 19,099 Percent coding: 72% 27% ORFs with gene names: 3,344 (53%) 688 (4%)

  6. Core biology is carried out by similar numbers of proteins

  7. Building a Biological Rosetta Stone Worm orthologs with functional description Yeast ORFs with functional description P-Value 1e-10 86% 64% 1e-20 89% 69% 1e-40 93% 61% 1e-60 96% 74% 1e-80 96% 74% 1e-100 98% 77% 1e-200 98% 88%

  8. Distribution of core biological functions conserved in both yeast and worm

  9. Core Biological Functions • Signal Transduction: kinases, phosphatases, Ras superfamily and other GTP-binding proteins,GDP/GTP exchange factors, ADP-ribosylation factors, adenylyl/guanylyl cyclases, phosphatidylinositol kinases, EF-hand proteins • DNA/RNA Metabolism:polymerases, helicases, topoisomerases, repair/recombination-related, nucleases, primases, splicing factors, initiation/elongation factors (transcription & translation), tRNA synthetases, histone acetylases/deacetylases • Transport & Secretion:ABC transporters, permeases, vesicle coat & fusion proteins, clatherin-accociated, protein targeting, signal recognition particle, nuclear pore-associated • Cytoskeletal:Actin, myosin, tubulin, actin-related proteins, actin-interacting proteins, septins, cytokinesis-related proteins

  10. Core Biological Functions (cont’d) • Ribosomal:ribosomal proteins (small & large subunit), ribosome processing proteins • Protein Folding and Degradation:heat shock proteins, chaperonins, proteasome subunits, ubiquitin-related, peptidyl prolyl cis-trans isomerase, protein disulfide isomerases, aminopeptidases, post-translational modifying enzymes (farnesyltransferase, myristoyltransferase, glycosylation, GPI-anchoring) • Intermediary Metabolism: dehydrogenases, reductases, mutases, lyases, isomerases, carboxylases, decarboxylases, nucleotide biosynthetic enzymes, transaminases, deaminases, epimerases, oxygenases, cytochromes, flavoproteins

  11. Constructing Sequence Similarity Groups

  12. Similarity Groups: MCM DNA replication initiator complex

  13. Similarity Groups: Tubulin

  14. Multiple Sequence Alignments

  15. Domain Analysis • 122 common eukaryotic protein domains. • Associated with regulation of gene expression and signal transduction. • Compare occurrence and domain architectures in yeast and worm protein sequences. • Position-dependent weight matrices (profiles) to detect domains (PSI-BLAST). • Classify worm-only, yeast-only, and shared domains.

  16. Worm-Only Domains • Nuclear hormone receptors • Epidermal growth factor • Degenerins • FMRFamides (neuropeptides) • Cadherin • PTB (phosphotyrosine binding) • T-box, SMAD (transcription factor domains) • Insulin-like peptides • Laminin NT

  17. Yeast-Only Domains • C6 (Zn-binding cluster) • ASPES (DNA-binding)

  18. Protein kinase catalytic C2H2 Finger AAA ATPase DAG Kinase Arrestin Ankyrin SWI/SNF helicase RING-finger bHLH RHO GAP/GEF Plecstrin homology SH3 Ubiquitin SH2 cNMP-signaling domains CaM EF-hands Homeodomains Potassium channels 7TM receptors HINT Immunoglobulin LRR vWA MATH POZ LIM Shared Domains (Yeast & Worm)

  19. Frequency of occurrence of common domains Domain counts are normalized to the number of proteins with a given domain per 1000 genes.

  20. Conclusions • Core biological functions are carried out by orthologous proteins occurring in comparable numbers in yeast and worm. • These represent approx. 40% of the predicted yeast ORFs and 20% of the predicted worm ORFs. • Regulatory and signaling proteins in worm do not have orthologs in yeast but often share domains. • Complete results are available online at SGD at http://genome-www.stanford.edu/Saccharomyces/worm

  21. Future Directions • Incorporate more sensitive sequence search results. • More sophisticated clustering scheme. • Multi-domain proteins and weak similarities. • Up-to-date with to changes in the genomic datasets. • Add/remove protein coding regions • Correction of errors in the genomic sequence • Sequence name changes • Extended annotation support. • Controlled vocabularies, gene function ontologies. • Comparative genomics framework for additional genomes. • More flexible browsing of genome-wide similarities. • Prototype yeast genome protein similarity Java viewer

  22. Genome-wide protein similarity view • Explore protein sequence similarities within or between genomes • Graphical user interface • Available at SGD for the yeast genome • Sequence Resources, Protein Similarity View

  23. Acknowledgements • Saccharomyces Genome Database (Stanford) • Gavin Sherlock • Cathy Ball • Selina Dwight • Midori Harris • Kara Dolinski • Shuai Weng • Eric Hester • Mike Cherry • David Botstein

  24. Acknowledgements (cont’d) • NCBI (Nat’l Library of Medicine) • L. Aravind • Eugene Koonin • Boston University • Scott Mohr • James Freeman • Temple Smith • Neomorphic Software (Berkeley) • www.neomorphic.com

  25. Extra slides

  26. Single-linkage clustering and multi-domain proteins “Chaining” 1. 2. 3.

  27. Whole genomic DNA microarrayDeRisi et al.(1997) Science 278: 680

  28. Building a Biological Rosetta Stone

More Related