1.05k likes | 1.06k Views
This resource provides an overview of the principles and challenges involved in constructing biomedical ontologies, with case studies from the National Center for Biomedical Ontology. It covers topics such as ontology definition, organizational challenges, and the role of ontologies in decision making.
E N D
Principles for Building Biomedical Ontologies Suzanna Lewis National Center Biomedical Ontology 22 October 2005 Advanced Bioinformatics, Cold Spring Harbor
Mark Musen Suzanna Lewis Barry Smith Sima Misra Daniel Rubin Michael Ashburner Monte Westerfield Ida Sim PI & Core 1: computer science (SMI) Co-PI & Core 2: bioinformatics (BiKR; GO) Core 6: Outreach and training (ECOR) Associate Program Director Program Director Core 3: Phenotype Project (Cambridge; FlyBase; and GO) Core 3: Phenotype Project (UOregon; PI of ZFIN) Core 3: HIV clinical trials Project (UCSF) National Center Biomedical Ontologyhttp://bioontology.org/
BiKRs • Sima Misra • Shu Shengqiang • Christopher J. Mungall • Nomi Harris • John Day-Richter • Karen Eilbeck • Mark Gibson
Outline for the Morning • A definition of “ontology” • Four sessions: • Organizational Challenges • Principles for Ontology Construction • Case Studies from the GO • Case Studies for group discussion.
What data is missing? My newbie questions What I’ve heard • Organism, environment, data quality and attribution • Where is the data generated? • TIGR, Sanger, JGI, and coming soon to a 954 near you! • How will it be gathered? • Still an issue. Low threshold of effort relative to benefits of complying • What is the motivation? • Data it is accumulating on disks across the world and we’d like to be able to locate and use it The hardest part: Sharing (semantics)
Ontologies help with decision making Where should I eat…? handy ontology tells us what’s there…
Type of cuisine (Presumable) country of origin Ontologies don’t just organize data; they also facilitate inference, and that creates new knowledge, often unconsciously in the user.
What a computer would likely infer about the world from this helpful ontology: Fresh Juice is a national cuisine… Flag of fresh juice Where delicatessen food hails from… ‘Frozen Yogurt’ cuisine in search of a national identity?
Ontology is all about meaning • Communities form (scientific) theories • that seek to explain all of the existing evidence • and can be used for prediction • We make inferences and decisions based upon what we know about (biological) reality.
Make our meanings clear enough for a computer to understand • An ontology is a computable representation of this underlying (biological) reality. • An ontology enables a computer to reason over the data in (some of) the ways that we do • particularly to query and locate relevant data. • A shared, common, backbone taxonomy of relevant entities, and the relationships between them, within an application domain. • Referred to by information scientists as an ’Ontology'.
But really… • What is an Ontology? • From Aristotle to Artificial Intelligence • It is ”a formalism of what exists” • Follows formal rules for creating definitions originally laid down by Aristotle. • A definition is: the specification of the essence (nature, invariant structure) shared by all the members of a class or natural kind.
The Aristotelian Methodology • Topmost nodes are the undefinable primitives. • The definition of a class lower down in the hierarchy is provided by specifying the parent of the class together with the relevant differentia. • Differentia tells us what marks out instances of the defined class within the wider parent class as in • Plasma membrane • is acell part [immediate parent] • that surrounds the cytoplasm [differentia]
organism animal cat instances Siamese classes Physical object (substance) mammal leaf class frog all members of the class frog share a froggy nature
Anatomical structures Lung Heart Thorax Cell Cornelius Rosse
Content of FMA Challenge: Duplicate graphical model in symbolic model Universals or classes: Kinds of anatomical entities Adapted from Bloom & Fawcett: Textbook of Histology 1994 12th ed Chapman & Hall
1. Organizational Challenges http://obo.sourceforge.net
So you want an ontology… What do you have to do to make/get/use/steal/beg one?
Why Survey Domain covered? Public? Community? Active? Salvage Develop Applied? Improve yes no Collaborate & Learn
What you must do • Justify exactly why there is a need • Scope it very, very tightly • Communicate with people
The decisions you must make • What domain does it cover? • It is privately held? • Is it active? • Is it applied?
Survey Why Domain covered? Public? Community? Active? Salvage Develop Applied? Improve yes no Collaborate & Learn (Listen to Barry)
Due diligence & background research • Step 1: Learn what is out there • The most comprehensive list is on the OBO site. http://obo.sourceforge.net • Assess ontologies critically and realistically. • Make contact
Why Survey Domain covered? Public? Community? Active? Salvage Develop Applied? Improve yes no Collaborate & Learn (Listen to Barry)
Ontologies must be shared • Proprietary ontologies • Belief that ownership of the terminology gives the owners a competitive edge • For example, Incyte or Monsanto in the past, SNOMED for non-US. • Data cannot be shared if the ontologies describing the data are not shared. • Don’t reinvent—Use the power of combination and collaboration
Why Survey Domain covered? Public? Community? Active? Salvage Develop Applied? Improve yes no Collaborate & Learn (Listen to Barry)
Pragmatic assessment of an ontology • Is there access to help, e.g.: help-me@weird.ontology.net ? • Does a warm body answer help mail within a ‘reasonable’ time—say 2 working days ?
Why Survey Domain covered? Public? Community? Active? Salvage Develop Applied? Improve yes no Collaborate & Learn (Listen to Barry)
Use it to improve it • Every ontology improves when it is applied to actual data • It improves even more when these data are used to answer questions • There will be fewer problems in the ontology and more commitment to fixing remaining problems when important research data is involved that scientists depend upon • Be very wary of ontologies that have never been applied
Improve Collaborate and Learn Work with that community • To improve (if you found one) • To develop (if you did not) • Getting it right • It is impossible to get it right the 1st (or 2nd, or 3rd, …) time. • What we know about reality is continually growing
Implication: “prepare for change” • Establish a mechanism for change. • Use CVS or Subversion. • Changes must be reviewed by experts • Unique Identifiers • Versions • Archives
Ontology development is hard • Have a stake in seeing it work. • Have broad, detailed domain knowledge. • Will engage in vigorous debate without engaging egos. • Will do concrete work and attend frequent working sessions (quarterly), phone conferences (weekly), e-mail correspondence (daily).
Why do we need rules for good ontology? • Ontologies must be intelligible • to humans (for annotation) and • to machines (for reasoning and error-checking) • Unintuitive rules for classification lead to entry errors (problematic links) • Facilitate training of curators • Overcome obstacles to alignment with other ontology and terminology systems • Enhance harvesting of content through automatic reasoning systems • Following basic rules makes more useful ontologies
Substance. Quantity. Quality. Relation. Location. Time. Position. Possession. Doing. Undergoing. Aristotle’s categories This is Aristotle’s list of types of predication, that is, the different ways in which things can be said to be. He identifies 10 mutually exclusive categories.
Substance Body Structure Specimen Context-Dependent Categories* Attribute Finding* Staging and Scales Organism Physical Object Events Environments and Geographic Locations Qualifier Value Special Concept* Pharmaceutical and Biological Products Social Context Disease Procedure Physical Force SNOMED-CT Top Level
Examples of Rules • Don’t confuse instances with universals • Your navel (instance) is not the abstract representation of all navels • Your microarray result is not the abstract representation of all microarray results • The meaning of an ontology should not change when the programming language changes
First Rule: Univocity • Terms (including those describing relations) should have the same meanings on every occasion of use. • In other words, they should refer to the same kinds of instances in reality
Example of univocity problem in case of part_of relation (Old) Gene Ontology: • ‘part_of’ = ‘may be part of’ • flagellum part_of cell • ‘part_of’ = ‘is at times part of’ • replication fork part_of the nucleoplasm • ‘part_of’ = ‘is included as a sub-list in’
Second Rule: Positivity • Complements of classes are not themselves classes. • Terms such as ‘non-mammal’, or ‘non-frog’, or ‘non-membrane’ do not designate genuine classes.
Third Rule: Objectivity • Which classes exist is not a function of our biological knowledge. • Terms such as ‘unknown’ or ‘unclassified’ do not designate biological natural kinds.
C is_a2 B is_a1 A Fourth Rule: Single Inheritance • No class in a classificatory hierarchy should have more than one is_a parent on the immediate higher level • I.e. no diamonds
Following the single inheritance rule • The position of a term within the hierarchy enriches its own definition by incorporating automatically the definitions of all the terms above it. • The entire information content of the term hierarchy can be translated very cleanly into a computer representation
B C is_a1 is_a2 A ‘is_a’ no longer univocal Problems with multiple inheritance
Fifth Rule: Clarity of Text Definitions • The terms used in a definition should be simpler (more intelligible) than the term to be defined • otherwise the definition provides no assistance to human understanding • Machines can cope with the full formal representation (it doesn’t need the text)
Sixth Rule: Basis in Reality • When building or maintaining an ontology, always think carefully about how classes (types, kinds, species) relate to instances in reality • Axioms governing instances • Every class has at least one instance (exceptions will occur at top levels) • Each child class has a smaller collection of instances than its parent class
The reason that rules are important: Interoperability • Ontologies should work together • Avoid redundancy in ontology building • Support reuse • Ontologies should be capable of being used by other ontologies (cumulation)
SNOMED MeSH UMLS NCIT HL7-RIM … None of these have clearly defined relations Still remain too much at the level ofTERMINOLOGY Not based on a common set of rules Not based on a common set of relations The problem of ontology re-use
An example of unclear relationship use • A is_a B • ‘A’ is more specific in meaning than ‘B’ • HL7-RIM: • Individual Allele is_aAct of Observation • cancer documentation is_acancer • disease prevention is_adisease