200 likes | 326 Views
Data Content of the BioCyc Databases. BioCyc Tier 1 Databases. EcoCyc Project – EcoCyc.org. E. co li En cyc lopedia Review-level Model-Organism Database for E. coli Tracks evolving annotation of the E. coli genome and cellular networks The two paradigms of EcoCyc
E N D
EcoCyc Project – EcoCyc.org • E.coli Encyclopedia • Review-level Model-Organism Database for E. coli • Tracks evolving annotation of the E. coli genome and cellular networks • The two paradigms of EcoCyc • “Multi-dimensional annotation of the E. coli K-12 genome” • Positions of genes; functions of gene products – 76% / 66% exp • Gene Ontology terms; MultiFun terms • Gene product summaries and literature citations • Evidence codes • Multimeric complexes • Metabolic pathways • Regulation of gene expression and of protein activity Karp, Gunsalus, Collado-Vides, Paulsen Nuc. Acids Res. 35:7577 2007ASM News 70:25 2004 Science 293:2040
EcoCyc = E.coli Dataset + Pathway/Genome Navigator URL: EcoCyc.org Pathways: 246 Reactions: Metabolic: 1394 Transport: 246 Compounds: 1,830 EcoCyc v13.6 Citations: 19,000 Proteins: 4,479 Complexes: 895 RNAs: 285 Gene Regulation: Operons: 3,369 Trans Factors: 196 Promoters: 1,796 TF Binding Sites: 2,205 Genes: 4,492
EcoCyc Gene and Protein Information • Gene locations and protein functions updated through literature curation and in collaboration with RefSeq, EcoGene, and UniProt • EcoCyc curators author minireview summaries for gene products, complexes, pathways, and transcription units • Gene Ontology terms curated by EcoCyc and imported regularly from UniProt • Protein features regulatory imported from UniProt
EcoCyc Regulation • Multiple types of regulatory information present in EcoCyc • Transcriptional regulation and operon organization • Attenuation • Regulation of translation by small RNAs and proteins • Regulation of protein activity by covalent and non-covalent means
Other E. coli Genomes in BioCyc • Currently BioCyc contains ~40 other E. coli and Shigella genomes • New genomes will be included from RefSeq as BioCyc expands • SRI is building orthology-based curation tools that will allow us to propagate curation from EcoCyc to these other strains
EcoCyc Accelerates Science • Experimentalists • E. coli experimentalists • Experimentalists working with other microbes • Analysis of expression data • Computational biologists • Biological research using computational methods • Genome annotation • Study connectivity of E. coli metabolic network • Study phylogentic extent of metabolic pathways and enzymes in all domains of life • Bioinformaticists • Training and validation of new bioinformatics algorithms – predict operons, promoters, protein functional linkages, protein-protein interactions, • Metabolic engineers • “Design of organisms for the production of organic acids, amino acids, ethanol, hydrogen, and solvents “ • Educators
EcoliHub Resource • www.ecolihub.org • Hub search • Simultaneously searches 12 different E. coli databases • EcoliHub Omics • Omics data repository and analysis for E. coli • EcoliHouse • Queryable MySQL server containing multiple E. coli databases • EcoliWiki • Community contributed content about E. coli
MetaCyc: Metabolic Encyclopedia • Describe a representative sample of every experimentally determined metabolic pathway • Describe properties of metabolic enzymes • Literature-based DB with extensive references and commentary • Pathways, reactions, enzymes, substrates • Jointly developed by • P. Karp, R. Caspi, C. Fulcher, SRI International • L. Mueller, A. Pujar, Boyce Thompson Institute • S. Rhee, P. Zhang, Carnegie Institution Nucleic Acids Research2008
MetaCyc Pathway Ontology • Provides a classification system for metabolic pathways
Biosynthesis [902] • Amino acids Biosynthesis [105] • Aromatic Compounds Biosynthesis [13] • Carbohydrates Biosynthesis [70] • Cell structures Biosynthesis [31] • Cofactors, Prosthetic Groups, Electron Carriers Biosynthesis [160] • Hormones Biosynthesis [40] • Fatty Acids and Lipids Biosynthesis [101] • Metabolic Regulators Biosynthesis [4] • Nucleosides and Nucleotides Biosynthesis [20] • Amines and Polyamines Biosynthesis [32] • Secondary Metabolites Biosynthesis [351] • Antibiotic Biosynthesis [20] • Fatty Acid Derivatives Biosynthesis [7] • Flavonoids Biosynthesis [70] • Nitrogen-Containing Secondary Compounds Biosynthesis [64] • Alkaloids Biosynthesis [43] • Phenylpropanoid Derivatives Biosynthesis [46] • Phytoalexins Biosynthesis [25] • Sugar Derivatives Biosynthesis [10] • Terpenoids Biosynthesis [103] • Siderophore Biosynthesis [7]
Degradation/Utilization/Assimilation [639] • Alcohols Degradation [14] • Aldehyde Degradation [12] • Amines and Polyamines Degradation [40] • Amino Acids Degradation [113] • Aromatic Compounds Degradation [152] • C1 Compounds Utilization and Assimilation [24] • Carbohydrates Degradation [52] • Carboxylates Degradation [30] • Chlorinated Compounds Degradation [39] • Cofactors, Prosthetic Groups, Electron Carriers Degradation [2] • Fatty Acid and Lipids Degradation [18] • Inorganic Nutrients Metabolism [72] • Nitrogen Compounds Metabolism [15] • Phosphorus Compounds Metabolism [3] • Sulfur Compounds Metabolism [54] • Nucleosides and Nucleotides Degradation and Recycling [9] • Secondary Metabolites Degradation [58] • Nitrogen Containing Secondary Compounds Degradation [13] • Sugar Derivatives Degradation [31] • Terpenoids Degradation [10]
Detoxification [16] • Acid Resistance [2] • Arsenate Detoxification [3] • Mercury Detoxification [1] • Methylglyoxal Detoxification [8]
Generation of precursor metabolites and energy [124] • Chemoautotrophic Energy Metabolism [14] • Hydrogen Oxidation [2] • Electron Transfer [11] • Fermentation [34] • Glycolysis [6] • Methanogenesis [12] • Pentose Phosphate Pathways [4] • Photosynthesis [6] • Respiration [25] • Aerobic Respiration [9] • Anaerobic Respiration [14] • TCA cycle [9]
Curation Level • EcoCyc and MetaCyc have many types of data that you will not see in Tier 3 databases • Examples: • Regulation • Minireview summaries • Citations • GO terms • Protein features
BioCyc Ortholog Data • Currently BioCyc ortholog data obtained from CMR all-vs-all protein BLAST comparisons • Require bidirectional best BLAST hits, at least 10% identity, at least 40% similiarity, P-value under 1 • Not all organisms contain ortholog data currently • CMR lacks entries for some organisms • Some BioCyc genomes not obtained from CMR