1 / 100

GO: The Gene Ontology

GO: The Gene Ontology. Pascale Gaudet dictyBase curator Northwestern University, Chicago, IL. Outline. Introduction to the Gene Ontology Gene Ontology annotations Editing the Gene Ontology Practical applications for the Gene Ontology

talmai
Download Presentation

GO: The Gene Ontology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GO: The Gene Ontology Pascale Gaudet dictyBase curator Northwestern University, Chicago, IL

  2. Outline • Introduction to the Gene Ontology • Gene Ontology annotations • Editing the Gene Ontology • Practical applications for the Gene Ontology • The Gene Ontology as one of many biological ontologies

  3. Sequence databases: GenBank, EMBL, DDBJ

  4. Genome Databases * Mouse Genome Informatics * FlyBase: Drosophila * WormBase: C. elegans * The Arabidopsis Information Resource * dictyBase: Dictyostelium discoideum * Saccharomyces Genome Database: Budding Yeast * ZFIN: Zebrafish * EcoGene - E. coli • GeneCards • Human ensembl • NCBI human genome resources * manually curated by scientists

  5. Published Literature • PubMed: over 15 million citations • Basic search:rad51 → 1038 articles • Limit search:rad51, Human (organism) → 485 • Boolean operators:rad51 AND cancer → 234 articles

  6. Gene Ontology • Gene annotation system • Controlled vocabulary that can be applied to all organisms • Used to describe gene products

  7. What’s in a name? • What is a cell?

  8. Cell

  9. Cell

  10. Cell

  11. Cell

  12. Cell Image from http://microscopy.fsu.edu

  13. What’s in a name? • The same name can be used to describe different concepts

  14. What’s in a name?

  15. What’s in a name? • Glucose synthesis • Glucose biosynthesis • Glucose formation • Glucose anabolism • Gluconeogenesis • All refer to the process of making glucose from simpler components

  16. What’s in a name? • The same name can be used to describe different concepts • A concept can be described using different names  Comparison is difficult – in particular across species or across databases

  17. What is the Gene Ontology? A (part of the) solution: • A controlled vocabulary that can be applied to all organisms • Used to describe gene products - proteins and RNA - in any organism

  18. Ontology • In philosophy, the most fundamental branch of metaphysics. It studies being or existence as well as the basic categories thereof—trying to find out what entities and what types of entities exist. – Wikipedia • Ontologies provide controlled, consistent vocabularies to describe concepts and relationships, thereby enabling knowledge sharing – Gruber 1993

  19. Ontology Includes: • A vocabulary of terms (names for concepts) • Definitions • Defined logical relationships to each other

  20. Ontology Structure node edge node node Ontologies can be represented as graphs, where the nodes are connected by edges • Nodes = concepts in the ontology • Edges = relationships between the concepts

  21. Ontology Structure • The Gene Ontology is structured as a hierarchical directed acyclic graph (DAG) • Terms can have more than one parent and zero, one or more children • Terms are linked by two relationships • is-a • part-of

  22. Simple hierarchies (Trees)Directed Acyclic Graphs Single parent One or more parents

  23. Directed Acyclic Graphs (DAG) protein complex organelle [other organelles] mitochondrion [other protein complexes] fatty acid beta-oxidation multienzyme complex is-a part-of

  24. True Path Rule • The path from a child term all the way up to its top-level parent(s) must always be true cell • cytoplasm • chromosome • nuclear chromosome • nucleus • nuclear chromosome • is-a • part-of

  25. How does GO work? • What does the gene product do? • Why does it perform these activities? • Where does it act? What information might we want to capture about a gene product?

  26. GO: Three ontologies What does it do? Molecular Function What processes is it involved in? Biological Process Where does it act? Cellular Component gene product

  27. Cellular Component • where a gene product acts

  28. Mitochondrial membrane

  29. Biological Process

  30. Gluconeogenesis

  31. Molecular Function • A single reaction or activity, not a gene product • A gene product may have several functions • Sets of functions make up a biological process

  32. Molecular Function

  33. Carbonate dehydratase activity

  34. What’s in a GO term? term: gluconeogenesis id: GO:0006094 definition: The formation of glucose from noncarbohydrate precursors, such as pyruvate, amino acids and glycerol.

  35. Content of GO • Molecular Function 7,309 terms • Biological Process 10,041terms • Cellular Component1,629 terms • Total 18, 975 terms • Definitions: 94.9 % • Obsolete terms: 992 As of October 2005

  36. Outline • Introduction to the Gene Ontology • Gene Ontology annotations • Editing the Gene Ontology • Practical applications for the Gene Ontology • The Gene Ontology as one of many biological ontologies

  37. Annotation of gene products with GO terms Mitochondrial P450

  38. Cellular component: mitochondrial inner membrane GO:0005743 Biological process: Electron transport GO:0006118 Molecular function: monooxygenase activity GO:0004497 substrate + O2 = CO2 +H20 product

  39. Other gene products annotated to monooxygenase activity (GO:0004497) - monooxygenase, DBH-like 1 (mouse) - prostaglandin I2 (prostacyclin) synthase (mouse) - flavin-containing monooxygenase (yeast)    - ferulate-5-hydrolase 1 (arabidopsis)

  40. Two types of GO Annotations:  Electronic Annotation  Manual Annotation • All annotations must: • be attributed to a source • indicate what evidence was found to support the GO term-gene/protein association

  41. Manual Annotations • High–quality, specific gene/gene product associations made, using: • Peer-reviewed papers • Evidence codes to grade evidence BUT – is very time consuming and requires trained biologists

  42. Electronic Annotations • Provides large-coverage • High-quality • BUT–annotations tend to use high-level GO terms and provide little detail.

  43. Electronic Annotations: Methods • Database entries • Manual mapping of GO terms to concepts external to GO (‘translation tables’) • Proteins then electronically annotated with the relevant GO term(s) • Automatic sequence similarity analyses to transfer annotations between highly similar gene products

  44. Fatty acid biosynthesis (Swiss-Prot Keyword) EC:6.4.1.2 (EC number) IPR000438: Acetyl-CoA carboxylase carboxyl transferase beta subunit (InterPro entry) GO:Fatty acid biosynthesis (GO:0006633) GO:acetyl-CoA carboxylaseactivity (GO:0003989) GO:acetyl-CoA carboxylase activity (GO:0003989) Electronic Annotations

  45. Mappings of external concepts to GO EC:1.1.1.1 > GO:alcohol dehydrogenase activity ; GO:0004022 EC:1.1.1.10 > GO:L-xylulose reductase activity ; GO:0050038 EC:1.1.1.104 > GO:4-oxoproline reductase activity ; GO:0016617 EC:1.1.1.105 > GO:retinol dehydrogenase activity ; GO:0004745

  46. Manual Annotations: Methods • Extract information from published literature • Curators performs manual sequence similarity analyses to transfer annotations between highly similar gene products (BLAST, protein domain analysis)

  47. Finding GO terms …for B. napus PERK1 protein (Q9ARH1) In this study, we report the isolation and molecular characterization of the B. napus PERK1 cDNA, that is predicted to encode a novel receptor-like kinase. We have shown that like other plant RLKs, the kinase domain of PERK1 has serine/threonine kinase activity, In addition, the location of a PERK1-GFP fusion protein to the plasma membrane supports the prediction that PERK1 is an integral membrane protein…these kinases have been implicated in early stages of wound response… serine/threonine kinase activity, integral membrane protein wound response PubMed ID: 12374299 Function: protein serine/threonine kinase activity GO:0004674 Component: integral to plasma membrane GO:0005887 Process: response to wounding GO:0009611

  48. Additional points • A gene product can have several functions, cellular locations and be involved in many processes • Annotation of a gene product to one ontology is independent from its annotation to other ontologies • Annotations are only to terms reflecting a normal activity or location • Usage of ‘unknown’ GO terms

More Related