430 likes | 448 Views
amo amos amot amomus amotis amont . Happy birthday Swiss-Prot Fortaleza August 2006. Three (Orthogonal) Ontologies. Biological Process Goal or objective within cell, tissue .. Molecular Function Elemental activity or task Cellular Component
E N D
amo amos amot amomus amotis amont . Happy birthday Swiss-Prot Fortaleza August 2006
Three (Orthogonal) Ontologies • Biological Process • Goal or objective within cell, tissue .. • Molecular Function • Elemental activity or task • Cellular Component • Location or complex
Content of GO • molecular function 7,432 terms • biological process 10,740terms • cellular component 1,772 terms • all 19,994 terms • definitions 19,042 (96%)
!version v.4.2 !date 4 November 1998 !author Michael Ashburner $Gene Ontology ; GO:0000001 ; remark: $function ; GO:0000002 ; remark: %macromolecule ; GO:0000003 ; remark: %protein ; GO:0000004 ; remark: %enzyme ; GO:0000005 ; remark: %alpha-alpha-trehalase ; GO:0000006 ; remark: ; EC:3.2.1.28 %alpha-alpha-trehalose-phosphate synthase (UDP-forming) ; GO:0000007 ; remark: ; EC:2.4.1.15 %alpha-L-fucosidase ; GO:0000008 ; remark: ; EC:3.2.1.51 %alpha-N-acetylglucosaminidase ; GO:0000009 ; remark: ; EC:3.2.1.50 %alpha-amylase ; GO:0000010 ; remark: ; EC:3.2.1.1 %alpha-glucosidase II ; GO:0000011 ; remark: ; EC:3.1.2.20 %alpha-ketoacid dehydrogenase complex ; GO:0000012 ; remark: <oxoglutarate dehydrogenase (lipoamide) ; GO:0000013 ; remark: ; EC:1.2.4.2 .... %DNA-directed DNA polymerase ; GO:0000054 ; remark: ; EC:2.7.7.7 %nuclear DNA-directed DNA polymerase ; GO:0000055 ; remark: %alpha DNA polymerase ; GO:0000056 ; remark: <alpha DNA polymerase, 180Kd-subunit ; GO:0000057 ; remark: ma11> wc gene_ontology.v4.1 3081 22643 192480 gene_ontology.v4.1
Banbury Center, CSH Labs, August 1998 The founding meeting of the Gene Ontology Consortium
Problems with the GO: is_a and part_of relationships are poorly defined and not used consistently. carries a baggage of implicit ontologies. lack of relationships between the three GO ontologies.
Problems with the GO: is_a and part_of relationships are poorly defined and not used consistently. carries a baggage of implicit ontologies. lack of relationships between the three GO ontologies.
Implicit ontologies within the GO: • cysteine biosynthesis (ChEBI) • myoblast fusion (Cell Type Ontology) • hydrogen ion transporter activity (ChEBI) • snoRNA catabolism (Sequence Ontology) • wing disc pattern formation (Drosophila anatomy) • epidermal cell differentiation (Cell Type Ontology) • regulation of flower development (Plant anatomy) • interleukin-18 receptor complex (not yet in OBO) • B-celldifferentiation (Cell Type Ontology)
Integrating ontologies CL GO blood cell cell differentiation lymphocyte differentiation lymphocyte B-cell activation B-cell is_a B-cell differentiation
CELL Ontology [Term] id: CL:0000236 name: B-cell is_a: CL:0000542 ! lymphocyte develops_from: CL:0000231 ! B-lymphoblast Augmented GO [Term] id: GO:0030183 name: B-cell differentiation is_a: GO:0042113 ! B-cell activation is_a: GO:0030098 ! lymphocyte differentiation intersection_of: is_a GO:0030154 ! cell differentiation intersection_of: has_participant CL:0000236 ! B-cell
Problems with the GO: is_a and part_of relationships are poorly defined and not used consistently. carries a baggage of implicit ontologies. lack of relationships between the three GO ontologies.
obo.sf.net obo
The OBO Foundry • To create the conditions for a step-by-step evolution towards robust gold standard reference ontologies in the biomedical domain. • To introduce some of the features of scientific peer review into biomedical ontology development.
The OBO Foundry A subset of OBO ontologies whose developers agree in advance to accept a common set of principles designed to assure • intelligibility to biologist curators, annotators, users • formal robustness • stability • compatibility • interoperability • support for logic-based reasoning
The OBO Foundry • The ontology is open and available to be used by all. • The developers of the ontology agree in advance to collaborate with developers of other OBO Foundry ontology where domains overlap. The importance of community collaboration cannot be overstated. • The ontology is in, or can be instantiated in, a common formal language. • The ontology possesses a unique identifier space within OBO. • The ontology provider has procedures for identifying distinct successive versions.
The OBO Foundry • The ontology has a clearly specified and clearly delineated content. • The ontology includes textual definitions for all terms. • The ontology is well-documented. • The ontology has a plurality of independent users. • The ontology uses relations which are unambiguously defined following the pattern of definitions laid down in the OBO Relation Ontology.
Foundational relations is_a part_of Spatial relations located_in contained_in adjacent_to Temporal relations transformation_of derives_from preceded_by Participation relations has_participant has_agent regulates
Good ontologies require: Consistent use of terms, supported by logically coherent (non-circular) definitions, in equivalent human-readable and computable formats Coherent shared treatment of relations to allow cascading inference both within and between ontologies
Ontology = A Representation of Types • Each node of an ontology consists of: • preferred term • term identifier • synonyms • definition, glosses, comments
Ontology = A Representation of Types Nodes in an ontology are connected by relations: primarily: is_a (= is subtype of) and part_of designed to support search, reasoning and annotation
The aims of SO • Develop a shared set of terms and concepts to annotate biological sequences. • Apply these in our separate projects to provide consistent query capabilities between them. • Provide a software resource to assist in the application and distribution of SO.
The scope of the SO • Features that can be located on a sequence with coordinates. exon, promoter, binding_site • Properties of these features: • Sequence attributes • Maternally_imprinted_gene • Consequences of mutation • mutation_affecting_editing • Chromosome variation • aneuploid
What is a pseudogene? • Human • Sequence similar to known protein but contains frameshift(s) and/or stop codons which disrupts the ORF. • Neisseria • A gene that is inactive - but may be activated by translocation (e.g. by gene conversion) to a new chromosome site. • - note such a gene would be called a “cassette” in yeast.
Give me all the dicistronic genes • Define a dicistronic gene in terms of the cardinality of the transcript to open-reading-frame relationship and the spatial arrangement of open-reading frames.
ISA—927 relationships PARTOF—186 relationships holonym meronym
Relationships allow reasoning. • VALIDATION - We can check the internal consistency of an annotation against the ontology. We can also check that any topological assertions are true. • 3’ UTR part_of mRNA • intron part_of mRNA
Classical Extensional Mereology • The formal properties of parts: • If A is a proper part of B then B is not a part of A (nothing is a proper part of itself) • If A is a part of B and B is a part of C then A is a part of C • Because of these rules, we can apply functions to parts…
Gene Ontology Consortium http://www.geneontology.org DictyBase The Pathogen Group Schizosaccharomyces pombe Genome Sequencing Project