1 / 43

The OBO Foundry

The OBO Foundry. Barry Smith. History of Ontology as Computational Artifact. 1970s: AI (based on FOL: McCarthy, Hayes) 1980s: KR, Knowledge Interchange Formats (Gruber, Hobbs ...) 1999: GO, OBO format (Ashburner, ...)

Download Presentation

The OBO Foundry

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The OBO Foundry Barry Smith

  2. History of Ontology as Computational Artifact • 1970s: AI (based on FOL: McCarthy, Hayes) • 1980s: KR, Knowledge Interchange Formats (Gruber, Hobbs ...) • 1999: GO, OBO format (Ashburner, ...) • 2000s: Semantic Web (based on OWL; Horrocks, Hendler, 1000 lite ontologies) • 2009: Reconciliation of OBO with OWL; but still 2 methodologies: OBO Foundry; NCBO Bioportal

  3. Ontology and the Semantic Web • html demonstrated the power of the Web to allow sharing of information • can we use semantic technology to create a Web 2.0 which would allow algorithmic reasoning with online information based on XLM, RDF and above all OWL (Web Ontology Language)? • can we use RDF and OWL to break down silos, and create useful integration of on-line data and information?

  4. people tried, but the more they were successful, they more they failed OWL breaks down data silos via controlled vocabularies for the description of data dictionaries Unfortunately the very success of this approach led to the creation of multiple, new, semantic silos – because multiple ontologies are being created in ad hoc ways

  5. reasons for this effect • Semantic Web (original) idea: if a million ‘lite ontologies bloom’, then somehow intelligence will be created • let’s all build new ones (shrink-wrapped software mentality – you will not get paid for reusing existing ontologies • requirements-driven software development, promotes forking, reduces potential for secondary uses

  6. Ontology success stories, and some reasons for failure A fragment of the “Linked Open Data” in the biomedical domain

  7. What you get with ‘mappings’ HPO: all phenotypes (excess hair loss, duck feet ...)

  8. What you get with ‘mappings’ HPO: all phenotypes (excess hair loss, duck feet ...) NCIT: all organisms

  9. What you get with ‘mappings’ all phenotypes (excess hair loss, duck feet) all organisms allose (a form of sugar)

  10. What you get with ‘mappings’ all phenotypes (excess hair loss, duck feet) all organisms allose (a form of sugar) Acute Lymphoblastic Leukemia (A.L.L.)

  11. Mappings are hard They are fragile, and expensive to maintain Need new authorities to maintain(one for each pair of mapped ontologies), yielding new risk of forking – who will police the mappings? The goal should be to minimize the need for mappings, by avoiding redundancy in the first place Invest resources in disjoint ontology modules which work well together – reduce need for mappings to minimum possible

  12. Why should you care? • you need to create systems for data mining and text processing which will yield useful digitally coded output • if the codes you use are constantly in need of ad hoc repair huge, resources will be wasted • serious investment in annotation will be defeated from the start • relevant data will not be found, because it will be lost in multiple semantic cemeteries

  13. How to do it right? • how create an incremental, evolutionary process, where what is good survives, and what is bad fails • where the number of ontologies needing to be linked is small • where links are stable • create a scenario in which people will find it profitable to reuse ontologies, terminologies and coding systems which have been tried and tested

  14. Reasons why GO has been successful • It is a system for prospective standardization built with coherent top level but with content contributed and monitored by domain specialists • Based on community consensus • Updated every night • Clear versioning principles ensure backwards compatibility; prior annotations do not lose their value • Initially low-tech to encourage users, with movement to more powerful formal approaches (including OWL-DL – though GO community still recommending caution)

  15. GO has learned the lessons of successful cooperation • Clear documentation • The terms chosen are already familiar • Fully open source (allows thorough testing in manifold combinations with other ontologies) • Subjected to considerable third-party critique • Tracker for user input with rapid turnaround and help desk

  16. GO has been amazingly successful in overcoming the data balkanization problem but it covers only generic biological entities of three sorts: • cellular components • molecular functions • biological processes no diseases, symptoms, disease biomarkers, protein interactions, experimental processes …

  17. OBO (Open Biomedical Ontology) Foundry proposal (Gene Ontology in yellow)

  18. environments are here Environment Ontology

  19. Population-level ontologies

  20. Ontology success stories, and some reasons for failure

  21. http://obofoundry.org

  22. The OBO Foundry: a step-by-step, evidence-based approach to expand the GO • Developers commit to working to ensure that, for each domain, there is community convergence on a single ontology • and agree in advance to collaboratewith developers of ontologies in adjacent domains. http://obofoundry.org

  23. OBO Foundry Principles • Common governance (coordinating editors) • Common training • Common architecture to overcome Tim Berners Lee-ism: • simple shared top level ontology • shared Relation Ontology: www.obofoundry.org/ro

  24. Open Biomedical Ontologies Foundry Seeks to create high quality, validated terminology modules across all of the life sciences which will be • one ontology for each domain, so no need for mappings • close to language use of experts • evidence-based • incorporate a strategy for motivating potential developers and users • revisable as science advances

  25. Principles http://obofoundry.org/wiki/index.php/OBO_FoundryPrinciples

  26. Pistoia AllianceOpen standards for data and technology interfaces in the life science research industry • consortium of major pharmaceutical and life science companies • can we address the data silo problems created by multiplicity of proprietary terminologies by declaring terminology ‘pre-competitive’ • require shared use of something like OBO Foundry ontologies in presentation of information?

  27. Virtual Physiological Human

  28. Only with a prospective standard like that of the OBO Foundry could something like the VPH work • designed to guarantee interoperability of ontologies from the very start (and to keep out weeds) • initial set of 10 criteria tested in the annotation of scientific literature model organism databases life science experimental results

  29. RELATION TO TIME GRANULARITY OBO Foundry coverage

  30. ORTHOGONALITY • modularity ensures • annotations can be additive • division of labor amongst domain experts • high value of training in any given module • lessons learned in one module can benefit work on other modules • incentivization of those responsible for individual modules

  31. Benefits of coordination Can more easily reuse what is made by others Can more easily inspect and criticize what is made by others Leads to innovations (e.g. Mireot strategy for importing terms into ontologies)

  32. 8 Foundry members (2010) CHEBI: Chemical Entities of Biological Interest GO: Gene Ontology PATO: Phenotypic Quality Ontology PRO: Protein Ontology XAO: Xenopus Anatomy Ontology ZFA: Zebrafish Anatomy Ontology

  33. Current Foundry members in yellow

  34. Prospective Foundry ontologies (in green): Foundational Model of Anatomy Ontology (FMA) Cell Ontology (CL) Sequence Ontology (SO) RNA Ontology (RnaO)

  35. Basic Formal Ontology (BFO) top level mid-level domain level OBO Foundry Modular Organization

  36. Problem cases • Common Anatomy Reference Ontology • Disease Ontology • Function Ontologies • Cellular Component Function • Cellular Function • Organ Function • Artifact Function (pumping, transporting ...) • Environment Ontology • Species Ontology (NCBI Taxonomy)

  37. IDO (Infectious Disease Ontology) Core Follows GO strategy of providing a canonical ontology of what is involved in every infectious disease – host, pathogen, vector, virulence, vaccine, transmission – accompanied by IDO Extensions for specific diseases, pathogens and vectors Provides common terminology resources and tested common guidelines for a vast array of different disease communities

  38. IDO (Infectious Disease Ontology) Consortium • MITRE, Mount Sinai, UTSouthwestern – Influenza • IMBB/VectorBase – Vector borne diseases (A. gambiae, A. aegypti, I. scapularis, C. pipiens, P. humanus) • Colorado State University – Dengue Fever • Duke University – Tuberculosis, Staph. aureus • Cleveland Clinic – Infective Endocarditis • University of Michigan – Brucellosis • Duke University, University at Buffalo – HIV

  39. Ontology for General Medical Science • http://code.google.com/p/ogms/ • (OBO) http://purl.obolibrary.org/obo/ogms.obo • (OWL) http://purl.obolibrary.org/obo/ogms.owl

  40. OGMS-based initiatives • Vital Signs Ontology (VSO) (Welch Allyn) • EHR / Demographics Ontology • Infectious Disease Ontology • Mental Health Ontology • Emotion Ontology

  41. Ontology for General Medical Science • JobstLandgrebe (then Co-Chair of the HL7 Vocabulary Group): • “the best ontology effort in the whole biomedical domain by far”

  42. How to keep clear about the distinction • processes of observation, • results of such processes (measurement data) • the entities observed

More Related