960 likes | 1.2k Views
The CROP ( C ommon R eference O ntologies for P lants) Initiative Barry Smith September 13, 2013 http://ontology.buffalo.edu/smith. The OBO Foundry Principles Reference ontologies vs. application ontologies Other ontology consortia The CROP Initiative Examples of ontologies within CROP.
E N D
The CROP (Common Reference Ontologies for Plants) Initiative Barry Smith September 13, 2013 http://ontology.buffalo.edu/smith
The OBO FoundryPrinciplesReference ontologies vs. application ontologiesOther ontology consortiaThe CROP InitiativeExamples of ontologies within CROP Agenda
How to find data? How to find other people’s data? How to reason with data when you find it? How to work out what data does not yet exist?
How to solve the problem of making the data we find queryable and re-usable by others? Part of the solution must involve: standardized terminologies and coding schemes
But there are multiple kinds of standardization for biological data, and they do not work well together Proposed solution: Ontology-based annotation of data
ontologies = standardized labels designed for use in annotations to make the data cognitively accessible to human beings and algorithmically accessible to computers
ontologies = high quality controlled structured vocabularies for the annotation (description) of data, images, journal articles …
Ramirez et al. Linking of Digital Images to Phylogenetic Data Matrices Using a Morphological Ontology Syst. Biol. 56(2):283–294, 2007
ontologies used in curation of literature what cellular component? what molecular function? what biological process?
Proposed framework: the Semantic Web • html demonstrated the power of the Web to allow sharing of information • can we use semantic technology to create a Web 2.0 which would allow algorithmic reasoning with online information based on a common Web Ontology Language (OWL)? • can we use netcentricity, common URLs, to break down silos, and create useful integration of on-line data and information
Ontology success stories, and some reasons for failure A fragment of the “Linked Open Data” in the biomedical domain
The more ontology-building is successful, the more it fails OWL breaks down data silos via controlled vocabularies for the description of data dictionaries Unfortunately the very success of this approach led to the creation of multiple, new, semantic silos – because multiple ontologies are being created in ad hoc ways
http://bioportal.bioontology.org/ Many ontologies in bioportal are created by importing content from existing ontologies and giving the terms imported new names and new IDs The result is chaos, with bits and pieces of the same ontologies chopped in multiple different places. Leads to massively redundant effort, forking and doom
A standard engineering methodology • It is easier to write useful software if one works with a simplified model • (“…we can’t know what reality is like in any case; we only have our concepts…”) • This looks like a useful model to me • (One week goes by:) This other thing looks like a useful model to him • Data in Pittsburgh does not interoperate with data in Vancouver • Science is siloed
A good solution to this silo problem must be: • modular • incremental • independent of hardware and software • bottom-up • evidence-based • revisable • incorporate a strategy for motivating potential developers and users
main reason for GO’s success Gene Ontology and associated databases “make it possible to systematically dissect large gene lists in an attempt to assemble a summary of the most enriched and pertinent biology” PMC2615629
GO provides a controlled system of terms for use in annotating (describing, tagging) data • multi-species, multi-disciplinary, open source • contributing to the cumulativity of scientific results obtained by distinct research communities • compare use of kilograms, meters, seconds in formulating experimental results
GO is 3 ontologies cellular component molecular function biological process
Top-Level Architecture Continuant Occurrent (Process, Event) Independent Continuant Dependent Continuant universals ..... ..... ..... instances
Problem with the GO • it covers only three types of entities • no diseases • no laboratory artifacts • no anatomy (above the cell) • only species-terms for development • no phenotypes
RELATION TO TIME GRANULARITY rationale of OBO Foundry coverage
First step (2001) a shared portal for (so far) 58 ontologies (low regimentation) http://obo.sourceforge.net NCBO BioPortal
OBO builds on the principles successfully implemented by the GO recognizing that ontologies need to be developed in tandem
Second step (2006) The OBO Foundryhttp://obofoundry.org/
RELATION TO TIME GRANULARITY initial OBO Foundry coverage
OBO Foundry Principles • common formal architecture • clearly delineated content (redundant – overlaps with orthogonality) • the ontology is well-documented (– overlaps with rules for definitions; needs expanding, for developers, for users, minimal metadata) • plurality of independent users • single locus of authority, trackers, help desk
OBO Foundry Principles • textual definitions plus formal definitions • all definitions should be of the genus-species form A =def. a B which Cs where B is the parent term of A in the ontology hierarchy • formal definitions use OBO format or OWL
Orthogonality • For each domain, there should be convergence upon a single ontology that is recommended for use by those who wish to become involved with the Foundry initiative • Part of the goal here is to avoid the need for mappings – which are in any case too expensive, too fragile, too difficult to keep up-to-date as mapped ontologies change • Orthogonality means: • everyone knows where to look to find out how to annotate each kind of data • everyone knows where to look to find content for application ontologies
Orthogonality = non-redundancy for the reference ontologies inside the Foundry • application ontologies can overlap, but then only in those areas where common coverage is supplied by a reference ontology
PRINCIPLES • COMMON FORMAL ARCHITECTURE: The ontology uses relations which are unambiguously defined following the pattern of definitions laid down in the Basic Formal Ontology (BFO) • http://www.ifomis.uni-saarland.de/bfo/ ‘formal’= domain neutral
Basic Formal Ontology Continuant Occurrent biological process Independent Continuant Dependent Continuant cell component molecular function
OBO Foundry provides guidelines (traffic laws) to new groups of ontology developers in ways which can counteract current dispersion of effort
New principle: Employ the methodology of cross-products compound terms in ontologies are to be defined as cross-products of simpler terms: E.g elevated blood glucose is a cross-product of PATO: increased concentration with FMA: blood and CheBI: glucose. = factoring out of ontologies into discipline-specific modules (orthogonality)
The methodology of cross-products enforcing use of common relations in linking terms drawn from Foundry ontologies serves • to ensure that the ontologies are maintained and revised in tandem • logically defined relations serve to bind terms in different ontologies together to create a network
environments Environment Ontology
top level mid-level domain level Basic Formal Ontology (BFO) Extension Strategy + Modular Organization