100 likes | 438 Views
GO Term Integration and Curation in Pathway Tools and EcoCyc. Ingrid M. Keseler Bioinformatics Research Group SRI International keseler@ai.sri.com. History of Classification and GO terms in EcoCyc. The MultiFun classification scheme was/is used for
E N D
GO Term Integration and Curation in Pathway Tools and EcoCyc Ingrid M. Keseler Bioinformatics Research Group SRI International keseler@ai.sri.com
History of Classification and GO terms in EcoCyc The MultiFun classification scheme was/is used for gene/gene product classification in EcoCyc. • Developed by Monica Riley and collaborators • Hierarchical classification scheme with 10 major categories for cellular function • In 2005, we began to add support for adding GO terms to genes/gene products.
Why go with GO? • GO has become the standard ontology/classification scheme for gene products • GO is being actively developed with input from the user communities • GO is allowing standardization of annotation across all domains of life • Data mining across genomes • Genome annotation by similarity (e.g. via InterPro, Pfam, TIGRFAM, COG mappings) • Tools that take advantage of GO annotations, e.g. microarray data clustering etc.
The Evolution of GO Within EcoCyc • 12/2005 -- Mapping of MultiFun terms to GO terms (multifun2go – Ashburner and Lomax): multiple specific GO terms were sometimes mapped to one general MultiFun term, resulting in misleading GO term annotations in EcoCyc; no evidence codes, citations • 12/2007 -- Mapping of EC reactions to GO terms (ec2go): imported GO terms for enzymes that catalyzed reactions with full EC number assignments; no evidence codes, citations
The Evolution of GO Within EcoCyc • 4/2008 -- Importing GO term assignments from UniProt; mostly computational evidence codes • Since ~2007 -- Manual curation of GO terms based on publications, with evidence codes (mostly experimental) and literature citations • Since ~2008 -- EcoCyc and EcoliWiki are the source of the official E. coli gene-association file (in collaboration with J. Hu and D. Siegele, EcoliWiki, Texas A&M)
Of Requirements and Differences • Specific requirements for GO gene-association file • Presence of evidence codes and citations • Pathway Tools uses a different evidence code ontology; it is therefore necessary to map the evidence codes carefully • Some types of evidence require use of a With/From qualifier in GO – e.g IPI, ISS • Annotation with other qualifiers is not required by GO (e.g. NOT, contributes_to, colocalizes_with) and is not (yet) supported by Pathway Tools
Tools for the Curator • GO classification editor is accessible via the protein editor • GO database can be searched in the editor; term definitions are available • Tools available locally (ask developers about general availability): • Import new GO database (for newly created terms etc.) • Export gene-association file
Manual Curation of GO terms • Ongoing when we curate or re-curate gene products within EcoCyc • No particular effort to back-fill GO terms; e.g. metabolic enzymes get experimental GO term assignments when we re-curate old metabolic pathways, or when new literature appears • Texas A&M team is part of the Reference Genome Annotation Project; GO term assignments from EcoliWiki get imported into EcoCyc on a regular basis
GO Term Statistics for E. coli (8/2009) • 3721 gene products annotated with at least one GO term • 42724 total GO term annotations, of which there are 6330 non-IEA annotations
Acknowledgements • Peter Karp • Suzanne Paley • Markus Krummenacker • Tomer Altman • Jim Hu • Debby Siegele • GO experts at the GO consortium