940 likes | 1.13k Views
Basic Building Blocks for Biomedical Ontologies. Barry Smith. Problems with UMLS-style approaches. let a million ontologies bloom, each one close to the terminological habits of its authors in concordance with the “not invented here” syndrome
E N D
Basic Building Blocks for Biomedical Ontologies Barry Smith
Problems with UMLS-style approaches • let a million ontologies bloom, each one close to the terminological habits of its authors • in concordance with the “not invented here” syndrome • then map these ontologies, and use these mappings to integrate your different pots of data
Mappings are hard They create an N2 problem; are fragile, and expensive to maintain Need new authorities to maintain(one for each pair of mapped ontologies), yielding new risk of forking – who will police the mappings? The goal should be to minimize the need for mappings, by avoiding redundancy in the first place – one ontology for each domain Invest resources in disjoint ontology modules which work well together – reduce need for mappings to minimum possible
Why should you care? • you need to create systems for data mining and text processing which will yield useful digitally coded output • if the codes you use are constantly in need of ad hoc repair huge, resources will be wasted • serious investment in annotation will be defeated from the start • relevant data will not be found, because it will be lost in multiple semantic cemeteries
How to do it right? • how create an incremental, evolutionary process, where what is good survives, and what is bad fails • where the number of ontologies needing to be used together is small – integration = addition • where these ontologies are stable • by creating a scenario in which people will find it profitable to reuse ontologies, terminologies and coding systems which have been tried and tested
Reasons why GO has been successful • It is a system for prospective standardization built with coherent top level but with content contributed and monitored by domain specialists • Based on community consensus • Updated every night • Clear versioning principles ensure backwards compatibility; prior annotations do not lose their value • Initially low-tech to encourage users, with movement to more powerful formal approaches (including OWL-DL – though still proceeding caution)
GO has learned the lessons of successful cooperation • Clear documentation • The terms chosen are already familiar • Fully open source (allows thorough testing in manifold combinations with other ontologies) • Subjected to considerable third-party critique • Tracker for user input and help desk with rapid turnaround
GO has been amazingly successful in overcoming the data balkanization problem but it covers only generic biological entities of three sorts: • cellular components • molecular functions • biological processes no diseases, symptoms, disease biomarkers, protein interactions, experimental processes …
OBO (Open Biomedical Ontology) Foundry proposal (Gene Ontology in yellow)
Environment Ontology (ENVO) Environment Ontology
The OBO Foundry: a step-by-step, evidence-based approach to expanding the GO • Developers commit to working to ensure that, for each domain, there is community convergence on a single ontology • and agree in advance to collaboratewith developers of ontologies in adjacent domains. http://obofoundry.org
OBO Foundry Principles • Common governance (coordinating editors) • Common training • Common architecture: • simple shared top level ontology (BFO) • shared Relation Ontology: www.obofoundry.org/ro
Open Biomedical Ontologies Foundry Seeks to create high quality, validated terminology modules across all of the life sciences which will be • one ontology for each domain, so no need for mappings • close to language use of experts • evidence-based • incorporate a strategy for motivating potential developers and users • revisable as science advances
Principles http://obofoundry.org/wiki/index.php/OBO_FoundryPrinciples
RELATION TO TIME GRANULARITY OBO Foundry coverage
ORTHOGONALITY • modularity ensures • annotations can be additive • division of labor amongst domain experts • high value of training in any given module • lessons learned in one module can benefit work on other modules • incentivization of those responsible for individual modules
Benefits of coordination Can more easily reuse what is made by others Can more easily inspect and criticize what is made by others Leads to innovations (e.g. Mireot strategy for importing terms into ontologies)
Foundry ontologies currently under review Plant Ontology (PO) Ontology for Biomedical Investigations (OBI) Ontology for General Medical Science (OBMS) Infectious Disease Ontology (IDO)
Basic Formal Ontology (BFO) top level mid-level domain level OBO Foundry Modular Organization
OBI • The Ontology for Biomedical Investigations • hfp://purl.org/obo/OBI_0000225
Purpose of OBI • To provide a resource for the unambiguous description of the components of biomedical investigations such as the design, protocols and instrumentation, material, data and types of analysis and statistical tools applied to the data • NOT designed to model biology
OBI Collaborating Communities • Crop sciences Generation Challenge Programme (GCP), • Environmental genomics MGED RSBI Group, www.mged.org/Workgroups/rsbi • Genomic Standards Consortium (GSC), www.genomics.ceh.ac.uk/genomecatalogue • HUPO Proteomics Standards Initiative (PSI), psidev.sourceforge.net • Immunology Database and Analysis Portal, www.immport.org • Immune Epitope Database and Analysis Resource (IEDB), http://www.immuneepitope.org/home.do • International Society for Analytical Cytology, http://www.isac-net.org/ • Metabolomics Standards Initiative (MSI), • Neurogenetics, Biomedical Informatics Research Network (BIRN), • Nutrigenomics MGED RSBI Group, www.mged.org/Workgroups/rsbi • Polymorphism • Toxicogenomics MGED RSBI Group, www.mged.org/Workgroups/rsbi • Transcriptomics MGED Ontology Group
Ontology for General Medical Science • http://code.google.com/p/ogms/ • (OBO) http://purl.obolibrary.org/obo/ogms.obo • (OWL) http://purl.obolibrary.org/obo/ogms.owl
OGMS-based initiatives • Vital Signs Ontology (VSO) (Welch Allyn) • EHR / Demographics Ontology • Infectious Disease Ontology • Mental Health Ontology • Emotion Ontology
Ontology for General Medical Science • JobstLandgrebe (then Co-Chair of the HL7 Vocabulary Group): • “the best ontology effort in the whole biomedical domain by far”
How to keep clear about the distinction • processes of observation, • results of such processes (measurement data) • the entities observed
How is the OBO Foundry organized? • Top-Level: Basic Formal Ontology (BFO) • Mid-Level: IAO, OBI, OGMS ... • Domain-Level: Foundry Bio-Ontologies
Basic Formal Ontology (BFO) top level mid-level domain level OBO Foundry Modular Organization
BFO: the very top Continuant Occurrent (Process, Event) Independent Continuant Dependent Continuant
RELATION TO TIME GRANULARITY obofoundry.org
BFO & GO continuant occurrent biological processes independent continuant cellular component dependent continuant molecular function
Basic Formal Ontology types Continuant Occurrent process, event Independent Continuant thing Dependent Continuant quality .... ..... ....... instances
Experience with BFO in building ontologies provides • a community of skilled ontology developers and users (user group has 120 members) • associated logical tools • documentation for different types of users • a methodology for building conformant ontologies by starting with BFO and populating downwards
How to build an ontology • import BFO into ontology editor such as Protégé • work with domain experts to create an initial mid-level classification • find ~50 most commonly used terms corresponding to types in reality • arrange these terms into an informal is_a hierarchy according to this universality principle • A is_a B every instance of A is an instance of B • fill in missing terms to give a complete hierarchy • (leave it to domain experts to populate the lower levels of the hierarchy)
Users of BFO PharmaOntology (W3C HCLS SIG) MediCognos / Microsoft Healthvault Cleveland Clinic Semantic Database in Cardiothoracic Surgery Major Histocompatibility Complex (MHC) Ontology (NIAID) Neuroscience Information Framework Standard (NIFSTD) and Constituent Ontologies Interdisciplinary Prostate Ontology (IPO) Nanoparticle Ontology (NPO): Ontology for Cancer Nanotechnology Research Neural Electromagnetic Ontologies (NEMO) ChemAxiom – Ontology for Chemistry
Users of BFO GO Gene Ontology CL Cell Ontology SO Sequence Ontology ChEBI Chemical Ontology PATO Phenotype (Quality) Ontology FMA Foundational Model of Anatomy Ontology ChEBI Chemical Entities of Biological Interest PRO Protein Ontology Plant Ontology Environment Ontology Ontology for Biomedical Investigations RNA Ontology
Users of BFO Ontology for Risks Against Patient Safety (RAPS/REMINE) eagle-i an VIVO (NCRR) IDO Infectious Disease Ontology (NIAID) National Cancer Institute Biomedical Grid Terminology (BiomedGT) US Army Biometrics Ontology US Army Command and Control Ontology Sleep Domain Ontology Subcellular Anatomy Ontology (SAO) Translaftional Medicine On (VO) Yeast Ontology (yOWL) Zebrafish Anatomical Ontology (ZAO)
Basic Formal Ontology continuant occurrent independent continuant dependent continuant organism
Continuants • continue to exist through time, preserving their identity while undergoing different sorts of changes • independent continuants – objects, things, ... • dependent continuants – qualities, attributes, shapes, potentialities ...
Occurrents • processes, events, happenings • your life • this process of accelerated cell division
Qualities temperature blood pressure mass ... are continuants they exist through time while undergoing changes
Qualities temperature / blood pressure / mass ... are dimensions of variation within the structure of the entity a quality is something which can change while its bearer remains one and the same