220 likes | 377 Views
Introduction to the Gene Ontology. GO Workshop 3-6 August 2010. Introduction to GO. GO and the GO Consortium (GOC) What the GOC does (and doesn’t do) GO Groups Working groups GO Wiki Dilemma: annotation strategies Sources for GO. http://www.geneontology.org/. The GO Consortium.
E N D
Introduction to the Gene Ontology GO Workshop 3-6 August 2010
Introduction to GO • GO and the GO Consortium (GOC) • What the GOC does (and doesn’t do) • GO Groups • Working groups • GO Wiki • Dilemma: annotation strategies • Sources for GO
The GO Consortium • began as a collaboration between FlyBase (Drosophila), the Saccharomyces Genome Database (SGD) and the Mouse Genome Database (MGD), in 1998 • GO Consortium groups are actively involved in developing the GO, providing annotations and supporting use of the GO http://www.geneontology.org/GO.consortiumlist.shtml
The GO Consortium provides: • central repository for ontology updates and annotations • central mechanism for changing GO terms (adding, editing, deleting) • quality checking for annotations • consistency checks for how annotations are made by different groups • central source of information for users • co-ordination of annotation effort
GO Consortium and GO Groups: • groups decide gene product set to annotate • biocurator training • tool development mostly by groups • many non-consortium groups • education and training by groups • outreach to biocurators/databases by GOC
http://wiki.geneontology.org/index.php/Main_Page Information about: • Development projects • Meetings • Annotation projects • Changes to the GO
The Annotation Dilemma • Exponential increase in biological data • More important than ever to provide annotation for this data • How to keep up?
Annotation Strategy • Experimental data • Many species have a body of published, experimental data • Detailed, species-specific annotation: ‘depth’ • Requires manual annotation of literature slow • Computational analysis • Can be automated faster • Gives ‘breadth’ of coverage across the genome • Annotations are general • Relatively few annotation pipelines
GO & PO: literature annotation for rice, computational annotation for rice, maize, sorghum, Brachypodia • Literature annotation for Agrobacterium tumefaciens, Dickeya dadantii, Magnaporthe grisea, Oomycetes • Computational annotation for Pseudomonas syringae pv tomato, Phytophthora spp and the nematode Meloidogyne hapla. Literature annotation for chicken, cow, maize, cotton; Computational annotation for agricultural species & pathogens. literature annotation for human; computational annotation for UniProtKB entries (237,201 taxa).
Community Annotation • Researchers are the domain experts – but relatively few contribute to annotation • time • 'reward' & 'employer/funding agency recognition' • training – easy to use tools, clear instructions • Required submission • Community annotation • Groups with special interest do focused annotation or ontology development • As part of a meeting/conference or distributed (eg. wikis) • Students!
Releasing GO Annotations • GO annotations are stored at individual databases • Sanity checks as data is entered – is all the data required filled in? • Databases do quality control (QC) checks and submit to GO • GO Consortium runs additional QC and collates annotations • Checked annotations are picked up by GO users • eg. public databases, genome browsers, array vendors, GO expression analysis tools
‘sanity’ check & GOC QC AgBase Quality Checks & Releases AgBase Biocurators ‘sanity’ check AgBase biocuration interface AgBase database GO analysis tools Microarray developers ‘sanity’ check UniProt db QuickGO browser GO analysis tools Microarray developers EBI GOA Project ‘sanity’ check: checks to ensure all appropriate information is captured, no obsolete GO:IDs are used, etc. ‘sanity’ check & GOC QC Public databases AmiGO browser GO analysis tools Microarray developers GO Consortium database
Sources of GO • Primary sources of GO: from the GO Consortium (GOC) & GOC members • most up to date • most comprehensive • Secondary sources: other resources that use GO provided by GOC members • public databases (eg. NCBI, UniProtKB) • genome browsers (eg. Ensembl) • array vendors (eg. Affymetrix) • GO expression analysis tools
Sources of GO annotation • Different tools and databases display the GO annotations differently. • Since GO terms are continually changing and GO annotations are continually added, need to know when GO annotations were last updated.
Secondary Sources of GO annotation • EXAMPLES: • public databases (eg. NCBI, UniProtKB) • genome browsers (eg. Ensembl) • array vendors (eg. Affymetrix) • CONSIDERATIONS: • What is the original source? • When was it last updated? • Are evidence codes displayed?
Differences in displaying GO annotations: secondary/tertiary sources.
Learning more about the GO At the GO Consortium website: • FAQs • Mailing groups • Tools that use GO • News about changes and updates • publications