960 likes | 1.08k Views
How to Build an Ontology. Barry Smith http://ontology.buffalo.edu/smith. Everywhere databases are being created. too often in such a way that the data is siloed leading to massive expense in integrating data in ad hoc ways
E N D
How to Build an Ontology • Barry Smith • http://ontology.buffalo.edu/smith
Everywhere databases are being created • too often in such a way that the data is siloed • leading to massive expense in integrating data in ad hoc ways • if the data could be collected on the basis of shared controlled vocabularies from the start, much of this massive expense could be avoided
Consequences of the Human Genome Project • we can match gene sequences very effectively, for example finding patterns shared between humans and mice • but we can make sense of these gene sequences only if we know • where in the cell they occur • with what molecular functions they are associated • to what biological processes they contribute
GO provides a controlled system of terms for use in annotating (describing, tagging) data • multi-species, multi-disciplinary, open source • contributing to the cumulativity of scientific results obtained by distinct research communities • compare use of kilograms, meters, seconds in formulating experimental results
Hierarchical view representing relations between represented types
Organ Part Organ Subdivision Anatomical Space Anatomical Structure Organ Cavity Subdivision Organ Cavity Organ Organ Component Serous Sac Tissue Serous Sac Cavity Subdivision Serous Sac Cavity is_a Pleural Sac Pleura(Wall of Sac) Pleural Cavity part_of Parietal Pleura Visceral Pleura Interlobar recess Mediastinal Pleura Mesothelium of Pleura
US $100 mill. invested in literature and data curation using GO over 11 million annotations relating gene products described in the UniProt, Ensembl and other databases to terms in the GO experimental results reported in 52,000 scientific journal articles manually annoted by expert biologists using GO
GO has learned the lessons of successful cooperation • Clear documentation • The terms chosen are already familiar • Fully open source (allows thorough testing in manifold combinations with other ontologies) • Subjected to constant third-party critique • Updated every night
ontologies used to annotate databases GlyProt MouseEcotope sphingolipid transporter activity DiabetInGene GluChem
annotation using common ontologies yields integration of databases GlyProt MouseEcotope Holliday junction helicase complex DiabetInGene GluChem
annotation using common ontologies can yield integration of image data
annotation using common ontologies can support comparison of image data
annotation with Gene Ontology • supports reusability of data • supports search of data by humans • supports reasoning with data by humans and machines • but the method works only to the degree that many, many people use the GO to annotate their data
GO has been amazingly successful in overcoming the data balkanization problem but it covers only generic biological entities of three sorts: • cellular components • molecular functions • biological processes and it does not provide representations of diseases, symptoms, …
Original OBO Foundry ontologies (Gene Ontology in yellow)
environments are here Environment Ontology
Ontology success stories, and some reasons for failure chaos
The OBO Foundry: a step-by-step, evidence-based approach to expand the GO • Developers commit to working to ensure that, for each domain, there is community convergence on a single ontology • and agree in advance to collaboratewith developers of ontologies in adjacent domains. http://obofoundry.org
OBO Foundry Principles • Common governance (coordinating editors) • Common training • Common architecture • simple shared top level ontology • shared Relation Ontology: www.obofoundry.org/ro
Organ Part Organ Subdivision Anatomical Space Anatomical Structure Organ Cavity Subdivision Organ Cavity Organ Organ Component Serous Sac Tissue Serous Sac Cavity Subdivision Serous Sac Cavity is_a Pleural Sac Pleura(Wall of Sac) Pleural Cavity part_of Parietal Pleura Visceral Pleura Interlobar recess Mediastinal Pleura Mesothelium of Pleura
Open Biomedical Ontologies Foundry Seeks to create high quality, validated terminology modules across all of the life sciences which will be • one ontology for each domain, so no need for mappings • close to language use of experts • evidence-based • incorporate a strategy for motivating potential developers and users • revisable as science advances
Benefits of coordination • Can profit from lessons learned through mistakes made by others • Can more easily reuse what is made by others • Can more easily inspect and criticize results of others’ work • Can more easily train people to do the necessary work
BFO Top-Level Ontology Continuant Occurrent (always dependent on one or more independent continuants) Independent Continuant Dependent Continuant
RELATION TO TIME GRANULARITY OBO Foundry coverage
List of BFO users http://www.ifomis.org/bfo/users
How to build an ontology • import BFO into Protégé • work with domain experts to create an initial mid-level classification • find ~50 most commonly used terms corresponding to types in reality • arrange these terms into an informal is_a hierarchy according to the principle • A is_a B every instance of A is an instance of B • fill in missing terms to give a complete hierarchy • (leave it to domain experts to populate the lower levels of the hierarchy)
Basic distinction among entities • type vs. instance • (science text vs. diary) • (human being vs. Tom Cruise) • (science diagram vs. photograph)
Terms in ontologies denote types (‘universals’) • it is generalizations that are important = types, types, kinds, species
An ontology is a representation of types • We learn about types in reality from looking at the results of scientific experiments in the form of scientific theories • experiments relate to what is particular science describes what is general
object organism animal cat siamese types mammal frog instances
Inventory vs. Catalog:Two kinds of representational artifact • Databases represent instances • Ontologies represent types
How do we know which general terms designate types? • Types are repeatables: • cell, electron, weapon, F16, citizen, refugee, ... • Instances are one-off: Bill Clinton, this laptop
BFO Top-Level Ontology Continuant Occurrent (always dependent on one or more independent continuants) Independent Continuant Dependent Continuant
Two kinds of entities • occurrents (processes, events, happenings) • continuants (objects, qualities, states...)
You are a continuant • Your life is an occurrent • You are 3-dimensional • Your life is 4-dimensional