E N D
1. Ontology Overview Microarray Database Systems
3. Ontology
4. Not Ontology a collection of facts that arise from an actual, specific situation
a model for an application domain (which would be a theory)
a database schema which defines categories and their data types
a tree
instance of, part of, etc
5. Why Ontology Consolidation of the understandings for a given area
through public discussion and review
Knowledge sharing
automatic
uniform queries and assertions
6. Gene Ontology™ Consortium www.geneontology.org
produce a dynamic controlled vocabulary that can be applied to all organisms even as knowledge of gene and protein roles in cells is accumulating and changing.
joint project
FlyBase, Mouse Genome Informatics (MGI), Saccharomyces Genome Database (SGD)
7. GO Background Growing sequence data
Growing functional analysis of genes
8. Why GO For genome research
permit powerful analysis methods if multiple databases use the same ontology to describe data
e.g. query multiple organisms using shared biology
help discover new gene functions for sequences
unification of biology
automatic annotation transformation
9. The Topic: Genes Genes are expressed in temporally and spatially characheristic patterns.
Gene product a protein or RNA
ribosomal RNA
Gene product groups entities that function as complexes
ribosome
10. The Topic: Genes (Attributes) Gene
may have more than one product, each may have distinct function
Gene product
gene products are often located in specific cellular compartments
gene products maybe part of a multi-component complexes
may have one or more biochemical, physiological, or structural functions
may include small molecules
11. The Topic: Genes (Q & A) Where is a gene expressed?
What is the (sub)-cellular localization of a gene product?
When is a gene expressed?
What is the function of a gene product?
What larger process is the function of a gene product a part?
What processes is a gene’s activities controlled?
What larger complex is this function a component of?
What genes in species A have a function of gene X in species B?
etc.
12. GO Objective to provide controlled vocabularies for the description of gene products
molecular function
biological process
cellular component
Notes
independent attributes
many to many relationship to gene products
Gene product groups may include small molecules; these are not represented in GO
13. Molecular Function a capability that a physical gene product (group) carries as a potential
what a gene product can do, not where/when
often a gene product is named by its function
a product has many to many relationship with a molecular function
Example
enzyme, transporter, ligand, mortor protein
adenylate cyclase, Toll receptor ligand
14. Molecular Function (2)
15. Biological Process a biological objective accomplished via one or more ordered assemblies of molecular functions
temporal
transformational
more than one step
Examples
cell growth and maintenance, signal transduction
pyrimidine metabolism, cAMP biosynthesis
16. Biological Process (2)
17. Cellular Component a component of a cell
part of some larger object
Examples
anatomical structure: nucleus
gene product group: ribosome, proteasome
18. Cellular Component (2)
19. What GO is NOT not a way to unify biological database
knowledge changes and updates lag behind
individual curators evaluate data differently
many aspects of biology are not included (domain structure, 3D structure, evolution, expression, etc)
not a dictated standard
not a database of gene sequences
not a catalog of gene products
20. Data Representation Terms
directed acyclic graphs (DAGs)
text format
Attributes
unique identifier: GO:nnnnnnn
%: is-a relationship
< : part-of relationship
synonym
unique identifier
database cross-reference
21. GO Terms file: GO.defs
syntax: tag: text or value
All tags are mandatory with the exception of the "comment" tag.
tags
term: the term cardinality 1
goid: the goid of the term cardinality 1
definition: the definition of the term cardinality 1
comment: a free text comment for the help of GO
annotators cardinality 0, 1
definition_reference: a reference for the definition cardinality 1, >1
22. GO Terms (2) term: 1,3-beta-glucanosyltransferase
goid: GO:0042124
definition: Catalysis of the splitting and linkage of 1,3-beta-glucan molecules, resulting in 1,3-beta-glucan chain elongation.
definition_reference: GO:jl
definition_reference: PMID:10809732
23. Go Terms (GO ID) each term defined has a unique ID
if the wording but not the meaning of a term is changed, the GO ID stay the same
if the meaning is changed, a new ID is added
24. Go Terms (DB Ref) Database Cross References
form: database:ID
Function ontology
EC: - Enzyme Commission e.g.: EC:3.5.1.6
TC: Transport Catalog e.g.: TC:2.A.29.10.1
UM-BBD_enzymeID: e.g.: UM-BBD_enzymeID:e0310
Process ontology
UM-BBD_pathwayID: e.g.:UM-BBD_pathwayID:dcb
MetaCyc: MetaCyc e.g.: MetaCyc:2ASDEG-PWY
Component ontology
none
25. Syntax Parent-child relationship (by indentation)
parent_term
child_term
Instance relationship:
%term0
%term1 % term2
term1 being an instance of term0 and also an instance of term2
e.g. process.ontology
26. Syntax (2) Part of relationship:
%term0
%term1 < term2 < term3
term1 being an instance of term0 and also a part-of of term2 and term3
Line syntax:
< | % term [; db cross ref]* [; synonym:text]* [ < | % term]*
27. XML Syntax <?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE go:go>
<go:go xmlns:go=http://www.geneontology.org/xml-dtd/go.dtd#
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<go:version timestamp="Wed May 9 23:55:02 2001" />
<rdf:RDF>
<go:term rdf:about="http://www.geneontology.org/go#GO:0016209"> <go:accession>GO:0016209</go:accession>
<go:name>antioxidant</go:name> <go:definition></go:definition>
<go:isa rdf:resource="http://www.geneontology.org/go#GO:0003674" />
<go:association>
<go:evidence evidence_code="ISS">
<go:dbxref>
<go:database_symbol>fb</go:database_symbol>
<go:reference>fbrf0105495</go:reference>
</go:dbxref>
</go:evidence>
<go:gene_product> <go:name>CG7217</go:name>
<go:dbxref>
<go:database_symbol>fb</go:database_symbol>
<go:reference>FBgn0038570</go:reference>
</go:dbxref> </go:gene_product>
</go:association>
</rdf:RDF> </go:go>
28. XML Syntax (2) RDF, not plain XML
semantics network
XML id and idref restrictions (e.g. no multiple parentage)
in RDF, unique url as the ID
29. Data Representation Term definitions
Evidence code
GO Term Bibliography
30. The 3 Ontologies Biological Process (process.ontology)
Molecular Function (function.ontology)
Cellular Component (component.ontology)
31. Annotation Collaborating databases annotate their gene products (or genes) with GO terms, providing references and indicating what kind of evidence is available to support the annotations.
32. Annotation (2)
33. Annotation (3) Indices of other Classification systems to GO
SWISS-PROT spkw2go
Enzyme Commission ec2go
EGAD egad2go
GenProtEC genprotec2go
TIGR role tigr2go
InterPro interpro2go
MIPS Funcat mips2go
34. Database Abbreviations The annotation file syntax calls for an identifier from a foreign database to be prefixed by the abbreviation of that database.
syntax: DB:identifier. (e.g. EC:1.8.1.4 )
legal database abbreviations: GO.xrf_abbs
abbreviation: ENSEMBL
database: Database of automatically annotated genomic data. object: Identifier.
example: ENSEMBL:ENSP00000265949 generic_url: http://www.ensembl.org/ url_syntax: example_url:http://www.ensembl.org/perl/protview?peptide=ENS…
35. True Path Rule the pathway from a child term all the way up to its top level parent(s) must always be true.
36. True Path Rule (2) chitin (???) metabolism
chitin biosynthesis
chitin catabolism (????)
cuticle chitin metabolism
cuticle chitin biosynthesis
cuticle chitin catabolism
cell wall chitin metabolism
cell wall chitin biosynthesis
cell wall chitin catabolism
37. True Path Rule (3)
38. Parent-child Relationships child:
instance of
part of
Example
casein kinase II has two children
casein kinase II regulator (part of)
casein kinase II catalyst (part of)
39. Logical Relationships if A part_of B and C instance_of B
A part_of C
if A instance_of B and B instance_of C
A instance_of C
if A part_of B and B part_of B
A part_of C
if A instance_of B and C part_of A
not necessarily C part_of B
40. Other Guidelines avoid species-specific definitions
no more specific than any of its children
no mutant processes
avoid cellular components and gene products in molecular function ontology
components that reside in multiple loations
e.g. not all chromosomes are inside nucleus
41. Other Guidelines (2)
42. Other Guidelines (3)
43. Application