1 / 25

Gene Ontology Overview and Perspective

Explore the Gene Ontology (GO) Consortium's shared language for molecular annotation, ontology building, cross-database queries, and data analysis. Learn about GO's three biological domains and the importance of genomic data analysis using GO. Discover how annotations of gene products are curated and their impact on data-driven biology.

austind
Download Presentation

Gene Ontology Overview and Perspective

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Gene Ontology Overview and Perspective Lung Development Ontology Workshop

  2. A (machine) interpretable representation of some aspect of biological reality Optic placode sense organ eye develops from is_a part_of sclera A biological ontology is: • what kinds of things exist? • what are the relationships between these things? http://www.macula.org/anatomy/eyeframe.html

  3. Gene Ontology (GO) Consortium • Formed to develop a shared language adequate for the annotation of molecular characteristics across organisms; a common language to share knowledge. • Seeks to achieve a mutual understanding of the definition and meaning of any word used; thus we are able to support cross-database queries. • Members agree to contribute gene product annotations and associated sequences to GO database; thus facilitating data analysis and semantic interoperability.

  4. Gene Ontology widely adopted AgBase

  5. GO represents three biological domains • Molecular Function = elemental activity/task • the tasks performed by individual gene products; examples are carbohydrate binding and ATPase activity • Biological Process = biological goal or objective • broad biological goals, such as mitosis or purine metabolism, that are accomplished by ordered assemblies of molecular functions • Cellular Component= location or complex • subcellular structures, locations, and macromolecular complexes; examples include nucleus, telomere, and RNA polymerase II holoenzyme

  6. Terms are defined graphically relative to other terms

  7. The Gene Ontology (GO) • Build and maintain logically rigorous and biologically accurate ontologies • Comprehensively annotate reference genomes • Support genome annotation projects for all organisms • Freely provide ontologies, annotations and tools to the research community • Build and maintain logically rigorous and biologically accurate ontologies • Comprehensively annotate reference genomes • Support genome annotation projects for all organisms • Freely provide ontologies, annotations and tools to the research community

  8. Building the ontologies • The GO is still developing daily both in ontological structures and in domain knowledge • Ontology development workshops focus on specific domains needing revision and bring together ontology developers and domain experts • Currently running ~2 workshops / year • Metabolism and cell cycle (Aug, 2004) • Immunology and defense response (Nov05, Apr06) • Early CNS development (June, 2006) • Peripheral nervous system development (Feb, 2007) • Blood Pressure Regulation (June, 2007) • Muscle Development (July, 2007)

  9. Building the ontology: Immune System Process 725 new terms related to immunology 127 new terms added to cell type ontology Red part_of Blue is_a Alex Diehl

  10. GO:0047519 P05147 GO:0047519 IDA PMID:2976880 PMID: 2976880 IDA Annotating Gene Products using GO P05147 Gene Product Reference GO Term Evidence

  11. Annotations are assertions • There is evidence that this gene product can be best classified using this term • The sourceof the evidence and other information is included • There is agreement on the meaning of the term

  12. Annotations for App: amyloid beta (A4) precursor protein Annotations are assertions Annotations are the connections between genomic information and the GO.Experimentsprovide the data that enables us to annotate gene products with terms from the ontologies.

  13. We use evidence codes to describe the basis of the annotation • IDA: Inferred from direct assay • IPI: Inferred from physical interaction • IMP: Inferred from mutant phenotype • IGI: Inferred from genetic interaction • IEP: Inferred from expression pattern • IEA: Inferred from electronic annotation • ISS: Inferred from sequence or structural similarity • TAS: Traceable author statement • NAS: Non-traceable author statement • IC: Inferred by curator • RCA: Reviewed Computational Analysis • ND: no data available Direct Experiment in organism NO Direct Experiment Inferred from evidence

  14. GO Annotation Stats: GO Annotations Total manual GO annotations - 388,633 Total proteins with manual annotations – 80,402 Contributing Groups (including MGI): - 19 Total Pub Med References – 346,002 Total number predicted annotations – 17,029,553 Total number taxa – 129,318 Total number distinct proteins – 2,971,374 I April 24, 2007

  15. Annotations of gene products to GO are genome specific Now we can query across all annotations based on shared biological activity.

  16. GO is a functional annotation system of great utility to the data-driven biologist

  17. GO enables genomic data analysis • Microarrays allow biologists to record changes in gene function across entire genomes • Result: Vast amounts of gene expression data desperately needing cataloging and tagging • Many data analysis tools use GO graph structure to statistically evaluate clusters of co-expressed genes based on shared functional annotations • 680 pub (of 1517) on GO list • 46 microarray tools contributed

  18. OCT 13, 2006 GO supports functional classifications Cancer Genome Projects

  19. GO is wildly successful Nature: January 2007 FIGURE 3. Representative cell-type-specific genes and corresponding molecular functions.

  20. Comprehensivelyannotate Reference Genomes • Human • Mouse • Fly • Rat • Chicken • Zebrafish • Worm • Dicty • E.coli • Saccharomyces cerevisiae • Schizosaccharomyces pombe • Arabidopsis thaliana

  21. Reference Genome Annotation Project • Priority genes: those implicated in human diseases • Determine orthologs/homologs in reference genomes • For these genes, comprehensively curate biomedical literature Mary Dolan

  22. Reference Genome Development Projects • Shared annotation focus = Coordinated attention to ontology structure • Orthology/homology set across primary model organisms • Reference ID mappings including associations of sequences, gene/proteins, and human diseases • Ultimately, transparent access to comprehensive information about genes among the primary data providers

  23. Ongoing Challenges for the GO Consortium 1. Verifying and maintaining domain representations in the ontology that reflect best knowledge of the real world. - Depends on the involvement of biologists (domain experts) - Difficult to automate - Must accommodate continuing changes in what we think we understand about biological systems 2. Providing comprehensive annotations, where experimental evidence is available, for all genes - Dependant on the quality of annotations from experimental literature - Combines manual curation by highly-trained scientists supplemented by computational inference prediction annotations - Comprehensiveness may depend on changes in biomedical publishing

  24. acknowledgements GO Michael Ashburner (Cambridge) J. Michael Cherry (Stanford) Suzanna Lewis (LBNL) Rex Chisholm (NWU) David Hill (Jackson Lab) Midori Harris (EBI) Chris Mungall (LBNL) Jane Lomax (EBI) Eurie Hong (Stanford) Jen Clark (EBI) GO @ MGI Alex Diehl Mary Dolan Harold Drabkin David Hill Li Ni Dmitry Sitnikov MGI Carol Bult Janan Eppig Jim Kadin Joel Richardson Martin Ringwald Lois Maltais TBK Reddy Monica McAndrews-Hill Nancy Butler

  25. Gene Ontology www.geneontology.org Mouse Genome Informatics www.informatics.jax.org GO Consortium is supported by NIH-NHGRI and by the European Union RTD Programme MGI projects are supported by NIH [NHGRI, NICH, and NCI]. PRO is supported by NIGMS Corpora is supported by NLM Bar Harbor, Maine, USA

More Related