390 likes | 394 Views
GOA: Looking after GO annotations. Emily Dimmer Gene Ontology Annotation (GOA) Database European Bioinformatics Institute Cambridge UK. E. Coli hub. http://www.geneontology.org. Reactome. Gene Ontology Annotation (GOA) Database. Member of the GO Consortium since 2001
E N D
GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database European Bioinformatics Institute Cambridge UK
E. Coli hub http://www.geneontology.org Reactome
Gene Ontology Annotation (GOA) Database • Member of the GO Consortium since 2001 • Largest open-source contributor of annotations to GO • Provides annotation for more than 139,000 species • GOA’s priority is to annotate the human proteome • GOA is responsible for human, chicken and bovine annotations in the GO Consortium
GOA Group GOA office EMBL-EBI Wellcome Trust Genome Campus, Hinxton, Cambridge, UK goa@ebi.ac.uk
GOA Group Evelyn Camon (senior GOA curator) David Binns (QuickGO, protein2go tools) Rachael Huntley (GOA curator) Daniel Barrell (GOA file releases & database) Emily Dimmer (GOA coordinator) Along with the help of UniProt curators at the EBI, UniProt controlled vocabularies, HAMAP group, InterPro group, IntAct curators, the IPI group, Ensembl, other EBI groups …and of course the GO editors and the other GO Consortium annotation groups
How does GOA annotate to the GO ? Electronic Annotation Manual Annotation • Both these methods have their advantages • They can be easily distinguished by the evidence code used.
Status of GOA Annotation October 2007 Stats • Annotations provided to over 140,000 taxa • Total of 415,576 PubMed references included as evidence. • Manual annotations integrated from external model organism and multi-species databases: AgBase, DictyBase, Ensembl, FlyBase, GDB, GeneDB(S.pombe),Gramene, HGNC, MGI, Reactome, RGD, Roslin, SGD, TAIR, TIGR, WormBase, ZFIN, the IntAct protein-protein interaction database, LIFEdb and the Proteome Inc dataset
Core information needed for a GO annotation 1. Gene or gene product identifier e.g. Q9ARH1 2. GO term ID e.g. GO:0004674 (protein serine/threonine kinase) 3. Reference ID e.g. PubMed ID: 12374299 GO_REF:0000001 4. Evidence code e.g. IDA ..and also in some cases: • Qualifiers available to modify interpretation of annotation: NOT contributes_to colocalizes_with • ‘With’ column information, to provide further information on the method (evidence code)
Electronic Annotation • A number of different techniques used by different GO Consortium annotation groups. • All resulting annotations must be high-quality and provide an explanation of the method (GO_REF) 1. Mapping of external concepts to GO terms 2. Automatic transfer of annotations to orthologs
GO:fatty acid biosynthesis (GO:0006633) GO:acetyl-CoA carboxylaseactivity (GO:0003989) GO:acetyl-CoA carboxylase activity (GO:0003989) • GO:DNA repair • (GO:0006281) Electronic annotation: GO mappings Fatty acid biosynthesis (SwissProt keyword) EC:6.4.1.2 (EC number) IPR000438: Acetyl-CoA carboxylase carboxyl transferase beta subunit (InterPro entry) MF_00527: Putative 3-methyladenine DNA glycosylase (HAMAP) Camon et al. BMC Bioinformatics. 2005; 6 Suppl 1:S17
Automatic transfer of annotations to orthologs Human Mouse Rat Zebrafish Xenopus Drosophila • Ensembl COMPARA • Homologies between different species calculated • GO terms projected from MANUAL annotation only (IDA, IEP, IGI, IMP, IPI) • One-to-one and apparent one-to-one orthologies only used. • http://www.ensembl.org/info/data/compara Macaque Chimpanzee Anopheles Zebrafish Human Human Human Guinea Pig Rat Mouse Aedes aegypti Tetraodon Rat Mouse Dog Chicken Fugu
Manual Annotation • High–quality, specific annotations made using: • Peer-reviewed papers • A range of evidence codes to categorize the types of evidence found in a paper • Very time consuming and requires trained biologists
In this study, we report the isolation and molecular characterization of the B. napus PERK1 cDNA, that is predicted to encode a novel receptor-like kinase. We have shown that like other plant RLKs, the kinase domain of PERK1 has serine/threonine kinase activity, In addition, the location of a PERK1-GTP fusion protein to the plasma membrane supports the prediction that PERK1 is an integral membrane protein…these kinases have been implicated in early stages of wound response… serine/threonine kinase activity, integral membrane protein wound response Finding Annotations …for B. napus PERK1 protein (Q9ARH1) PubMed ID: 12374299
Evidence Codes IDA: • Enzyme assays • In vitro reconstitution • Immunofluorescence • Cell fractionation TAS: • In the literature source the original experiments referred to are referenced.
Core information needed for a GO annotation 1. Gene or gene product identifier e.g. Q9ARH1 2. GO term ID e.g. GO:0004674 (protein serine/threonine kinase) 3. Reference ID e.g. PubMed ID: 12374299 GO_REF:0000001 4. Evidence code e.g. IDA ..and also in some cases: • Qualifiers available to modify interpretation of annotation NOT contributes_to colocalizes_with • ‘With’ column information, to provide further information on the method (evidence code)
The ‘Qualifier’ Column The Qualifier column is used to modify the interpretation of an annotation. Allowable values are: NOT colocalizes_with contributes_to
The ‘NOT’ qualifier • 'NOT' is used to make an explicit note that the gene product is not associated with the GO term. … particularly important when associating a GO term with a gene product should be avoided (but might otherwise be made, especially by an automated method). Also used to document conflicting claims in the literature. NOT can be used with ALL three GO Ontologies. e.g. This protein does not have ‘kinase activity’ because it has been found that this protein has a disrupted/missing an ‘ATP binding’ domain.
The ‘colocalizes_with’ qualifier • Gene products that are transiently or peripherally associated with an organelle or complex may be annotated to the relevant cellular component term, using the 'colocalizes_with' qualifier. Only used with GO Component Ontology
The ‘contributes_to’ qualifier Where an individual gene product that is part of a complex can be annotated to terms that describe the action (function or process) of the whole complex. i.e. annotating 'to the potential of the complex‘ • distinguishes an individual subunit from complex functions All gene products annotated using 'contributes_to' must also be annotated to a cellular component term representing the complex that possesses the activity. Only used with GO Function Ontology
QuickGO browser: Human Insulin Receptor (P06213)… etc. http://www.ebi.ac.uk/quickgo
Gene Association Files Tab delimited files: http://www.geneontology.org/GO.current.annotations.shtml * = optional field
ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/ http://www.ebi.ac.uk/GOA/downloads.html
Cow Output from the GOA database Redundant Non-Redundant based on IPI (International Protein Index) 625 proteome sets ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/
Cow Output from the GOA database Redundant Non-Redundant based on IPI (International Protein Index) 625 proteome sets ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/
… annotations are also displayed in: • All GO Consortium Model Organism Databases integrate and exchange GO annotation data to ensure a comprehensive set of annotations for their organism/area of interest. • Array Products and data analysis Affymetrix Spotfire Almac
… and Numerous Third Party Tools (http://www.geneontology.org/GO.tools.shtml)
Reference Genomes • Comprehensive annotation of a set of conserved pathway and disease-related proteins in human and orthologs in 11 other selected genomes • Empowers comparative methods used in first pass annotation of other proteomes. E. Coli hub Arabidopsis thaliana Caenorhabditis elegans Danio rerio (zebrafish) Dictyostelium discoideum Drosophila melanogaster Escherichia coli Homo sapiens Saccharomyces cerevisiae Mus musculus Schizosaccharomyces pombe Gallus gallus Rattus norvegicus
GOA annotation focuses Cardiovascular GO annotation Grant with the British Heart Foundation to support a collaboration with HGNC curators to provide full Gene Ontology annotation to genes associated with cardiovascular processes wiki: http://wiki.geneontology.org/index.php/Cardiovascular Immune GO annotation Interest in actively GO annotating immune relevant genes. GOA, UCL and MGI are collaborating to improve annotation for immunologically-important genes, WT grant pending. wiki: http://wiki.geneontology.org/index.php/Immunology
Electronic Annotation developmentsNew mappings: • Swiss-Prot Subcellar Location to GO (just released) • Swiss-Prot UniPathway Expansion of existing methods • Ensembl Compara species expansion
Acknowledgements Rolf Apweiler. Head of the EBI protein sequence database group Emily Dimmer Evelyn Camon Rachael Huntley Daniel Barrell David Binns Contact the GOA team: goa@ebi.ac.uk GOA web page: http://www.ebi.ac.uk/goa The Gene Ontology Consortium and 1.5 members of GOA currently supported by an P41 grant from the National Human Genome Research Institute (NHGRI) [grant HG002273], GOA is also supported by core EMBL funding and BBSRC Tools and Resources grant.