610 likes | 789 Views
Standards and Ontology. The OBO Foundry. Barry Smith Center of Excellence in Bioinformatics & Life Sciences, University at Buffalo IFOMIS, Saarland University http://ontology.buffalo.edu/smith. we are accumulating huge amounts of sequence data, image data, pharma data,.
E N D
Standards and Ontology The OBO Foundry Barry Smith Center of Excellence in Bioinformatics & Life Sciences, University at Buffalo IFOMIS, Saarland University http://ontology.buffalo.edu/smith
we are accumulating huge amounts of sequence data, image data, pharma data, ... • how do we know what data we have ? • how do I know what data you have ? • how do we know what data we don’t have ? • how do we make different sorts of data combinable, as we need to do in large domains such as neurodevelopment, immunology, cancer ...?
genomic medicine, molecular medicine, translational medicine, personalized medicine ... need • methods for data integration to enable reasoning across data at multiple granularities to identify biomedically relevant relations on the side of the entities themselves
where in the body ? what kind of disease process ? we need semantic annotation of data = we need ontologies
how create broad-coverage semantic annotation systems for biomedicine? • Semantic Web, Moby, wikis, etc. • let a million flowers (and weeds) bloom • to create integration rely on (automatically generated?) post hoc mappings
most successful, thus far: UMLS • built by trained experts • massively useful for information retrieval and information integration • UMLS Metathesaurus a system of post hoc mappings between source vocabularies separately built
UMLS-based mappings fall shortof creating interoperability • because local usage is respected • regimentation frowned upon, no concern for cross-framework consistency • UMLS terminologies have different grades of formal rigor, different degrees of completeness, different update policies
with UMLS-based annotations • we can know what data we have (via term searches), but it is noisy • we can map between data at single granularities (via ‘synonyms’), but synonymy information is noisy • how do we know what data we don’t have ? • how do we reason with data (as at the molecular level), when no common logical backbone ?
for science what is to be done? • to develop high quality annotation resources in a collaborative, community effort? • create an evolutionary path towards improvement of terminologies, of the sort we find elsewhere in science • find ways to reward early adopters of the results
for science for science, consistency is a sine qua non • science works out from a consensus core, and strives to isolate and resolve inconsistencies as it extends at the fringes • we need to create a consensus core • start with what for human beings are trivialities (low hanging fruit) and work out from there
Organ Part Organ Subdivision Anatomical Space Anatomical Structure Organ Cavity Subdivision Organ Cavity Organ Organ Component Serous Sac Tissue Serous Sac Cavity Subdivision Serous Sac Cavity is_a Pleural Sac Pleura(Wall of Sac) Pleural Cavity part_of Parietal Pleura Visceral Pleura Interlobar recess Mediastinal Pleura Mesothelium of Pleura FMA Foundational Model of Anatomy
for science clinical medicine relies on anatomy and molecular biology to provide integration across medical specialisms • include ontologies corresponding to the basic biomedical sciences in the core
for science but we need more • where do we find scientifically validated information linking gene products and other entities represented in biochemical databases to semantically meaningful terms pertaining to disease, anatomy, development, histology in different model organisms?
The methodology of annotations • science basis of the GO: trained experts curating peer-reviewed literature • different model organism databases employ scientific curators who use the experimental observations reported in the biomedical literature to associate GO terms with gene products in a coordinated way
A set of standardized textual descriptions of • cellular locations • molecular functions • biological processes • used to annotate the entities represented in the major biochemical databases • thereby creating integration across these databases and making them available to semantic search
what cellular component? what molecular function? what biological process?
This process • leads to improvements and extensions of the ontology • which in turn leads to better annotations • a virtuous cycle of improvement in the quality and reach of both future annotations and the ontology itself • RESULT: a slowly growing computer-interpretable map of biological reality within which major databases are automatically integrated in semantically searchable form
Five bangs for your GO buck • science base • cross-species database integration • cross-granularity database integration • through links to the things which are of biomedical relevance • semantic searchability links people to software
but now need to improve the quality of GO to support more rigorous logic-based reasoning across the data annotated in its terms need to extend the GO by engaging ever broader community support for the addition of new terms and for the correction of errors
but also need to extend the methodology to other domains, including clinical domains need for disease ontology immunology ontology symptom (phenotype) ontology clinical trial ontology ...
the problem existing clinical vocabularies are of variable quality and low mutual consistency need for prospective standards to ensure mutual consistency and high quality of clinical counterparts of GO need to ensure consistency of the new clinical ontologies with the basic biomedical sciences if we do not start now, the problem will only get worse
the solution • establish common rules governing best practices for creating ontologies and for using these in annotations • apply these rules to create a complete suite of orthogonal interoperable biomedical reference ontologies • this solution is already being implemented
First step (2003) • a shared portal for (so far) 58 ontologies • (low regimentation) • http://obo.sourceforge.net NCBO BioPortal
Second step (2004)reform efforts initiated, e.g. linking GO to other OBO ontologies to ensure orthogonality id: CL:0000062 name: osteoblast def: "A bone-forming cell which secretes an extracellular matrix. Hydroxyapatite crystals are then deposited into the matrix to form bone." is_a: CL:0000055 relationship: develops_from CL:0000008 relationship: develops_from CL:0000375 GO + Cell type = Osteoblast differentiation: Processes whereby an osteoprogenitor cell or a cranial neural crest cell acquires the specialized features of an osteoblast, a bone-forming cell which secretes extracellular matrix. New Definition
Third step (2006) The OBO Foundryhttp://obofoundry.org/
The OBO Foundry • a family of interoperable gold standard biomedical reference ontologies to serve the annotation of inter alia • scientific literature • model organism databases • clinical trial data The OBO Foundryhttp://obofoundry.org/
A prospective standard designed to guarantee interoperability of ontologies from the very start (contrast to: post hoc mapping) established March 2006 12 initial candidate OBO ontologies – focused primarily on basic science domains several being constructed ab initio by influential consortia who have the authority to impose their use on large parts of the relevant communities.
undergoing rigorous reform GO Gene Ontology ChEBI Chemical Ontology CL Cell Ontology FMA Foundational Model of Anatomy PaTO Phenotype Quality Ontology SO Sequence Ontology CARO Common Anatomy Reference Ontology CTO Clinical Trial Ontology FuGO Functional Genomics Investigation Ontology PrO Protein Ontology RnaO RNA Ontology RO Relation Ontology new The OBO Foundryhttp://obofoundry.org/
Annotations plus ontologies yield an ever-growing computer-interpretable map of biological reality.
Under consideration: • Disease Ontology (DO) • Biomedical Image and Image Process Ontology (BiiO) • Upper Biomedical Ontology (OBO UBO) • Ontology of Biomedical Investigations (OBI) • Clinical Trial Ontology (CTO) The OBO Foundryhttp://obofoundry.org/
OBO Foundry = a subset of OBO ontologies, whose developers have agreed in advance to accept a common set of principles reflecting best practice in ontology development designed to ensure • tight connection to the biomedical basic sciences • compatibility • interoperability, common relations • formal robustness • support for logic-based reasoning The OBO Foundryhttp://obofoundry.org/
CRITERIA • The ontology is OPENand available to be used by all. • The ontology is in, or can be instantiated in, a COMMON FORMAL LANGUAGE. • The developers of the ontology agree in advance to COLLABORATE with developers of other OBO Foundry ontology where domains overlap. CRITERIA The OBO Foundryhttp://obofoundry.org/
UPDATE: The developers of each ontology commit to its maintenance in light of scientific advance, and to soliciting community feedback for its improvement. • ORTHOGONALITY: They commit to working with other Foundry members to ensure that, for any particular domain, there is community convergence on a single controlled vocabulary. CRITERIA The OBO Foundryhttp://obofoundry.org/
for science orthogonality of ontologies implies additivity of annotations • if we annotate a database or body of literature with one high-quality biomedical ontology, we should be able to add annotations from a second such ontology without conflicts The OBO Foundryhttp://obofoundry.org/
CRITERIA CRITERIA • IDENTIFIERS: The ontology possesses a unique identifier space within OBO. • VERSIONING: The ontology provider has procedures for identifying distinct successive versions to ensure BACKWARDS COMPATIBITY with annotation resources already in common use • The ontology includes TEXTUAL DEFINITIONS and where possible equivalent formal definitions of its terms.
CRITERIA • CLEARLY BOUNDED: The ontology has a clearly specified and clearly delineated content. • DOCUMENTATION: The ontology is well-documented. • USERS: The ontology has a plurality of independent users. The OBO Foundryhttp://obofoundry.org/
CRITERIA • COMMON ARCHITECTURE: The ontology uses relations which are unambiguously defined following the pattern of definitions laid down in the OBO Relation Ontology.* • * Smith et al., Genome Biology 2005, 6:R46 The OBO Foundryhttp://obofoundry.org/
OBO Relation Ontology The OBO Foundryhttp://obofoundry.org/
IT WILL GET HARDER • Further criteria will be added over time in light of lessons learned in order to bring about a gradual improvement in the quality of Foundry ontologies • ALL FOUNDRY ONTOLOGIES WILL BE SUBJECT TO CONSTANT UPDATE IN LIGHT OF SCIENTIFIC ADVANCE The OBO Foundryhttp://obofoundry.org/
IT WILL GET HARDER • But not everyone needs to join • The Foundry is not seeking to serve as a check on flexibility or creativity • ALL FOUNDRY ONTOLOGIES WILL ENCOURAGE COMMUNITY CRITICISM, CORRECTION AND EXTENSION WITH NEW TERMS The OBO Foundryhttp://obofoundry.org/
GOALS • to introduce some of the features of SCIENTIFIC PEER REVIEW into biomedical ontology development • CREDIT for high quality ontology development work • KUDOS for early adopters of high quality ontologies / terminologies e.g. in reporting clinical trial results The OBO Foundryhttp://obofoundry.org/
GOALS • to providing a FRAMEWORK OF RULES to counteract the current policy of ad hoc creation of new annotation schemas by each clinical research group by • REUSABILITY: if data-schemas are formulated using a single well-integrated framework ontology system in widespread use, then this data will be to this degree itself become more widely accessible and usable The OBO Foundryhttp://obofoundry.org/
GOALS • to serve as BENCHMARK FOR IMPROVEMENTS in discipline-focused terminology resources • once a system of interoperable reference ontologies is there, it will make sense to calibrate existing terminologies in its terms in order to achieve more robust alignment and greater domain coverage • exploit the avenue of EVIDENCE-BASED MEDICINE (NIH CLINICAL RESEARCH NETWORKS) to foster their use by clinicians The OBO Foundryhttp://obofoundry.org/