1.67k likes | 1.89k Views
Part I: Biomedical Ontologies: A Critical Survey. Barry Smith http://ontology.buffalo.edu/smith. I: Biomedical Ontologies: A Critical Survey
E N D
Part I:Biomedical Ontologies: A Critical Survey • Barry Smith • http://ontology.buffalo.edu/smith
I:Biomedical Ontologies: A Critical Survey • Ontologies, terminologies and thesauri are now in common use in the domain of biomedical informatics. Their goal is to support search and retrieval, but also to advance genuine reasoning about biomedical phenomena and to enable re-use of heterogeneous data through the use of common systems of annotations. We examine a representative collection of biomedical ontologies in light of these criteria, and draw (somewhat sad) conclusions as to the current state of the field. • II. The Ontology of Biomedical Reality (terminology) • Ontologies to support scientific research and clinical medicine have special characteristics, which we shall outline in terms of a distinction between three levels: (1) the level of reality; (2) the level of cognitive representations; and (3) the level of the publicly accessible concretizations of such cognitive representations for example in ontologies. Against this background we shall clarify the relations between ontologies, terminologies, information models, databases, and similar artifacts. • III. The OBO Foundry Project: Towards Scientific Standards and Principles-Based Coordination in Biomedical Ontology Development • The OBO Foundryis a collaborative experiment, involving a group of ontology developers who have agreed in advance to the adoption of a growing set of principles specifying best practices in ontology development. The primary objective is to establish gold standard reference ontologies, one for each core domain of biomedical science. We shall describe how this objective is already being realized, and show how it can not only help solve the problems of data retrieval and re-use but also foster the development of the powerful tools that will be needed to reason with biomedical data in the future.
Problem: • how to reason with data deriving from different sources, each of which uses its own system of classification ?
Solution: Ontology !
Examples of current needs for ontologies in biomedicine • to enforce semantic consistency within a database • to enable data retrieval, sharing and re-use • to enable data integration (bridging across data at multiple granularities) • to allow querying
General trend • on the part of NIH, FDA and other bodies to consolidate ontology-based standards for the communication and processing of biomedical data.
Old approach • gather terminologies in libraries • Unified Medical Language System • National Library of Medicine
U M L S SNOMED DEMONS
New Approach • MusicBeanz
Semantic Web deposits • Pet Profile Ontology • Review Vocabulary • Band Description Vocabulary • Musical Baton Vocabulary • MusicBrainz Metadata Vocabulary • Kissology
http://www.w3.org/ • Beer Ontology • all instances of hops that have ever existed are necessarily ingredients of beer.
Both UMLS- and OWL-type responses involve ad hoc creation of new terminologies by each separate community, and an open-door policy for admission Many of these terminologies remain as torsos, gather dust, poison the wells, ...
OWL’s syntactic regimentation is not enough to ensure high-quality ontologies • – the use of a common syntax and logical machinery and the careful separating out of ontologies into namespaces does not solvethe problem of ontology integration
from Ontological Engineering • location =def. a spatial point identified by a name (p. 12) • arrivalPlace =def. a journey ends at a location (p. 13) • facet = def. ternary relation that holds between a frame, a slot, and the facet (p. 51) • an example of function is Pays, which obtains the price of a room after applying a discount (p. 13)
from Handbook of Ontology • On 'achieving consistency from multiple sources‘: • if exact semantic identity is lacking, terms can be unified at a higher level, and information that is possibly related can be retrieved as well. When the application objective is to study and understand, the end-user can reject misleading records. (p. 94) • owl:InverseFunctionalProperty defines a property that for which two different objects cannot have the same value, e.g. isTheSocialSecurityNumberOf (a social number is assigned to one person only) (p. 78)
U M L S SNOMED DEMONS The Good, the Bad, and the UGLY
A methodology for quality-assurance of ontologies • tested thus far in the biomedical domain on: • FMA • GO + other OBO Ontologies • FuGO • SNOMED • UMLS Semantic Network • NCI Thesaurus • ICF (International Classification of Functioning, Disability and Health) • ISO Terminology Standards • HL7-RIM
The Good • Foundational Model of Anatomy (FMA) • Pro • clear statement of scope: structural human anatomy, at all levels of granularity, from the whole organism to the biological macromolecule • Powerful treatment of definitions, from which the entire FMA hierarchy is generated – can serve as basis for formal reasoning • Con • Some unfortunate artifacts in the ontology deriving from its specific computer representation (Protégé)
Organ Part Organ Subdivision Anatomical Space Anatomical Structure Organ Cavity Subdivision Organ Cavity Organ Organ Component Serous Sac Tissue Serous Sac Cavity Subdivision Serous Sac Cavity is_a Pleural Sac Pleura(Wall of Sac) Pleural Cavity part_of Parietal Pleura Visceral Pleura Interlobar recess Mediastinal Pleura Mesothelium of Pleura
The Foundational Model of Anatomy • Follows formal rules for ‘Aristotelian’ definitions • When A is_a B, the definition of ‘A’ takes the form: • an A =def. a B which ... • a human being =def. an animal which is rational
FMA Example • Cell =def. ananatomical structure which consists ofcytoplasmsurrounded by a plasma membrane with or without a cell nucleus • Plasma membrane=def. acell part that surrounds the cytoplasm
The FMA regimentation • Each definition reflects the position in the hierarchy to which a defined term belongs. • The position of a term within the hierarchy enriches its own definition by incorporating automatically the definitions of all the terms above it. • The entire information content of the FMA’s term hierarchy can be translated very cleanly into a computer representation
Principle • Use Aristotelian definitions • An A is a B which C’s.
Intermediate • GALEN • Pro • Allows formal representation of clinical information • Allows multiple views of relevant detail as needed • Uses powerful Description Logic (DL)-based formal structure • Makes definitions easy to formulate • Con • Remains only partially developed • Contains errors: Vomitus contains carrot • – which DLs did not prevent
Principle • An ontology should not remain a torso
Principle • An ontology should have a properly personed help desk
Principle • An ontology should have procedures for up-dating in light of scientific advance
Intermediate • The Gene Ontology • Con • Poor formal architecture • Full of errors • menopause part_of death • Poor support for automatic reasoning and error-checking • Poor treatment of definitions • Not trans-granular • No relation to time or instances
The Gene Ontology • Pro • Open Source • Cross-Species • ... has recognized the need for reform, including explicit representation of granular levels
Old GO Definitions • hemolysis =def. the causes of hemolysis
GO now adopting structured definitions which contain both genus and differentiae Species =def Genus + Differentiae neuron cell differentiation =def differentiation by which a cell acquires features of a neuron
cone cell fate commitment retinal_cone_cell Ontology alignmentOne of the current goals of GO is to align: • Cell Types in GO • Cell Types in the Cell Ontology with • keratinocyte • keratinocyte differentiation • fat_cell • adipocyte differentiation • dendritic_cell • dendritic cell activation • lymphocyte • lymphocyte proliferation • T_lymphocyte • T-cell homeostasis • garland_cell • garland cell differentiation • heterocyst • heterocyst cell differentiation
id: CL:0000062 name: osteoblast def: "A bone-forming cell which secretes an extracellular matrix. Hydroxyapatite crystals are then deposited into the matrix to form bone." [MESH:A.11.329.629] is_a: CL:0000055 relationship: develops_from CL:0000008 relationship: develops_from CL:0000375 Alignment of the two ontologies will permit the generation of consistent and complete definitions GO + Cell type = Osteoblast differentiation: Processes whereby an osteoprogenitor cell or a cranial neural crest cell acquires the specialized features of an osteoblast, a bone-forming cell which secretes extracellular matrix. New Definition
Other Ontologies to be aligned with GO • Chemical ontologies • 3,4-dihydroxy-2-butanone-4-phosphate synthase activity • Anatomy ontologies • metanephros development
Principle • Exploit existing ontologies when formulating definitions
The Bad • Reactome • Pro • Rich catalogue of biological process • Con • Incoherent treatment of categories: • ReferentEntity (embracing e.g. small molecules) is a sibling of PhysicalEntity (embracing complexes, molecules, ions and particles). • Similarly CatalystActivity is a sibling of Event.
Principle • An ontology should be in agreement with the truths of basic science (e.g. that molecules are physical entities)
The UglyDisease Ontology / ICD-10 • Other problems with special functionsTuberculosis of unspecified bones and joints, tubercle bacilli not found by bacteriological or histological examination, but tuberculosis confirmed by other methods (inoculation of animals)Other mineral salts, not elsewhere classified, causing adverse effects in therapeutic useOther general medical examination for administrative purposes Assault by other specified means
The UglyDisease Ontology / ICD-10 • Other accidental submersion or drowning in water transport accident injuring other specified personAccident to powered aircraft, other and unspecified, injuring occupant of military aircraft, any rankOther accidental submersion or drowning in water transport accident injuring occupant of other watercraft - crew
The UglyDisease Ontology / ICD-10 • Normal pregnancyFall on stairs or ladders in water transport injuring occupant of small boat, unpoweredRailway accident involving collision with rolling stock and injuring pedal cyclistInjury due to war operations by lasersNontraffic accident involving motor-driven snow vehicle injuring pedestrian
The UglyDisease Ontology / ICD-10 • Donors of other specified organ or tissueFitting and adjustment of wheelchairHot (boiling) tap waterTraining in use of lead dog for the blindPerson consulting on behalf of another person
Principle • An ontology should have a clearly specified domain (captured by its root node)
“Circular Hierarchical Relationships in the UMLS:Etiology, Diagnosis, Treatment, Complications and Prevention”Olivier Bodenreider • Topographic regions: General terms • Physical anatomical entity • Anatomical spatial entity • Anatomical surface • Body regions • Topographic regions
Principle • Avoid cycles
MeSH • National Socialism is_a Political Systems • National Socialism is_a Anthropology ...
Principle • Use singular nouns
MeSH • National Socialism is_a MeSH Descriptor
Plant Ontology • cell = def. structural and physiological unit of a living organism; it (i.e., plant cell) consists of protoplast and cell wall; ...