420 likes | 495 Views
Ontologies in Biomedicine: The Good, The Bad and The Ugly. Barry Smith http://ontology.buffalo.edu/smith. The Good. Foundational Model of Anatomy (FMA) Pro
E N D
Ontologies in Biomedicine: The Good, The Bad and The Ugly Barry Smith http://ontology.buffalo.edu/smith http://ncor.us
The Good • Foundational Model of Anatomy (FMA) • Pro • Very clear statement of scope: structural human anatomy, at all levels of granularity, from the whole organism to the biological macromolecule • Powerful treatment of definitions, from which the entire FMA hierarchy is generated – can serve as basis for formal reasoning • Con • Some unfortunate artifacts in the ontology deriving from its specific computer representation (Protégé) http://ncor.us
Intermediate • GALEN • Pro • Allows formal representation of clinical information • Allows multiple views of relevant detail as needed • Uses powerful Description Logic (DL)-based formal structure • Con • Remains only partially developed • Contains errors: Vomitus contains carrot • – which DLs did not prevent http://ncor.us
Intermediate • The Gene Ontology • Con • Poor formal architecture • Full of errors • menopause part_of death • Poor support for automatic reasoning and error-checking • Poor treatment of definitions • Not trans-granular • No relation to time or instances http://ncor.us
The Gene Ontology • Pro • Open Source • Cross-Species • ... has recognized the need for reform, including explicit representation of granular levels http://ncor.us
Problem of Circularity • GO:0042270: • Protection from natural killer cell mediated cytolysis • Definition: The process of protecting a cell from cytolysis by natural killer cells. http://ncor.us
GO:0019836 hemolysis • Definition: The processes that cause hemolysis • X =def. the Y of X • this is worse than circular http://ncor.us
The Bad • Reactome • Pro • Rich catalogue of biological process • Con • Incoherent treatment of categories: • ReferentEntity (embracing e.g. small molecules) is a sibling of PhysicalEntity (embracing complexes, molecules, ions and particles). • Similarly CatalystActivity is a sibling of Event. http://ncor.us
The Bad • National Cancer Institute Thesaurus • Pro • Open source; ambitiously broad coverage; DL-based • Con • Poor realization of DL formalism • Full of mistakes (many inherited from its UMLS sources): • threedisjoint classes of plants: Vascular Plant, Non-vascular Plant, Other Plant • threedisjoint kinds of cells: Cell, Normal Cell, Abnormal Cell • Normal Cellis_a Microanatomy See http://ontology.buffalo.edu/medo/NCIT_Smith.html http://ncor.us
National Cancer Institute Thesaurus • Duratec, Lactobutyrin and Stilbene Aldehydeclassified as: Unclassified Drugs and Chemicals • Pro • NCIT, too, has recognized the need for reform • (NCIT is part of the OBO library) http://ncor.us
The UglyUMLS Semantic Network • Pros • Broad coverage; no multiple inheritance • Cons • Incoherent use of ‘conceptual entities’ • (e.g. the digestive system as a conceptual part of the organism) • Full of errors http://ncor.us
UMLS Semantic Network • Edges in the graph represent merely “possible significant relations”: • Bacterium causes Experimental Model of Disease • Experimental Model of Disease affects Fungus • Experimental model of diseaseis_a Pathologic Function http://ncor.us
UMLS Semantic Network • Unclear what the nodes of the graph are: • Drug Delivery Device contains Clinical Drug • Drug Delivery Device narrower_in_meaning_than Manufactured Object • The use-mention confusion: • “Swimming is healthy and has 8 letters” http://ncor.us
The UglyClinical Terms Version 2 (The Read Codes) • Classifies chemicals into: • chemicals whose name begins with ‘A’, • chemicals whose name begins with ‘B’, • chemicals whose name begins with ‘C’, ... http://ncor.us
The Astonishingly (Criminally?) Ugly • Health Level 7 • HL7 is a UML-based standard for exchange of information between clinical information systems • has proved very crumbly as a standard • The HL7 Reference Information Model (RIM) is supposed to overcome this problem by defining the universe of healthcare data in a rigorous way http://ncor.us
HL7-RIM • Animal • Definition: A subtype of Living Subject representing any animal-of-interest to the Personnel Management domain. • Person • A subtype of Living Subject representing single human being [sic] who, in the context of the Personnel Management domain, must also be uniquely identifiable through one or more legal documents. • LivingSubject • Definition: A subtype of Entity representing an organism or complex animal, alive or not. http://ncor.us
HL7 RIM: The Problem of Circularity • Person = Person with documents • has the form: ‘An A is an A which is B’ • – useless in practical terms since neither we nor the machine can use them to find out what ‘A’ means • – incorporate a vicious infinite regress • – have the effect of making it impossible to refer to A’s which are not Bs, for example to an undocumented person http://ncor.us
HL7 Logically Incoherent • act = the record of an act • This has the form: An X is the Y of an X • again worse than circular http://ncor.us
HL7-RIM: Logically Contradictory Definitions • Definition of Act: An Act is an action of interest that has happened, can happen, is happening, is intended to happen, or is requested/demanded to happen. • Definition of Act: An Act is the record of something that is being done, has been done, can be done, or is intended or requested to be done. http://ncor.us
HL7 RIM Ontologically Incoherent • The truth about the real world is constructed through a combination and arbitration of attributed statements ... • As such, there is no distinction between an activity and its documentation. http://ncor.us
HL7 Incredibly Successful • embraced as US federal standard; • central part of $15 billion program to integrate all UK hospital information systems • made mandatory by Canada Health Infoway • adopted by Oracle as basis for its EHR support programs http://ncor.us
HL7 Merchandizing http://ncor.us
From molecules to diseases • A good ontology should enable us to organize our information resources in such a way that we can bridge the granularity gap between genomics and proteomics data and phenotype (clinical, pharmacological, patient-centered) data http://ncor.us
good ontologies require: Coherent upper level taxonomy distinguishing • continuants (cells, molecules, organisms ...) • occurrents (events, processes) • dependent entities (qualities, functions ...) • independent entities (their bearers) • universals (types, kinds) • instances (tokens, instances) Coherent relation ontology supporting inference both within and between ontologies. http://ncor.us
good ontologies require: Consistent use of terms, supported by logically coherent (non-circular) definitions, in both human-readable and computable formats http://ncor.us
Open Biomedical Ontologies (OBO) Upper Biomedical Ontology (UBO) • root UBO:0000001:top • subclass BFO:continuant:continuant • subclass BFO:dependent_entity:dependent_entity • subclass UBO:0000023:quality • subclass UBO:0000026:phenotype • subclass UBO:0000025:state • subclass UBO:0000027:disease • subclass UBO:0000005:function • subclass GO:0003674:molecular_function • subclass BFO:disposition:disposition • subclass BFO:independent_entity:independent_entity • subclass UBO:0000002:substance • subclass UBO:0000019:protein • subclass GO:0005575:cellular_component • subclass UBO:0000006:anatomical_entity • subclass UBO:0000008:gross_anatomical_entity • subclass UBO:0000007:organism • subclass UBO:0000015:microbe • subclass UBO:0000014:plant • subclass UBO:0000017:animal • subclass BFO:fiat_part_of_substance:fiat_part_of_substance • subclass BFO:boundary_of_substance:boundary_of_substance • subclass BFO:aggregate_of_substances:aggregate_of_substances • subclass BFO:occurrent:occurrent • subclass BFO:dependent_occurrent:dependent_occurrent • subclass UBO:0000004:process • subclass GO:0008150:biological_process • subclass BFO:fiat_part_of_process:fiat_part_of_process • subclass UBO:0000029:life_cycle_stage • subclass BFO:aggregate_of_processes:aggregate_of_processes • subclass EO:0007359:environment ontology • subclass BFO:temporal_boundary_of_process:temporal_boundary_of_process • subclass BFO:independent_occurrent:independent_occurrent http://ncor.us
OBO Relation Ontology (RO) • Clear distinction between universals (classes, kinds, types and instances (individuals, tokens • Precise formal definitions of relations • Automatic applicability to time-indexed instance-data e.g. in Electronic Health Record • Consistency with the Relation Ontology now a criterion for admission to the OBO ontology library • see Genome Biology Apr. 2006 http://ncor.us
Three types of relations • between instances: • Mary’s heart part_of Mary • between an instance and a universal: • Mary instance_of homo sapiens • between universals: • gastrulation part_of embryonic development http://ncor.us
A suite of primitive instance-level relations • identical_to • part_of • located_in • adjacent_to • earlier • derives_from • ... http://ncor.us
A suite of defined relations between universals http://ncor.us
GALEN: Vomitus contains carrot • All portions of vomit contain all portions of carrot • All portions of vomit contain some portion of carrot • Some portions of vomit contain some portion of carrot • Some portions of vomit contain all portions of carrot http://ncor.us
all-some structure • A part_of B =def. given any instance a of A there is some instance b of B such that a part_of b on the instance level • Allows automatic ontology integration via cascading reasoning: • A R1 B • B R2 C • A R3 C http://ncor.us
adjacent_to • cell wall adjacent_to cytoplasm • intron adjacent_to exon • Golgi apparatus adjacent_to endoplasmic • reticulum • periplasm adjacent_to plasma membrane • presynaptic membrane adjacent_to synaptic cleft http://ncor.us
A adjacent_to B • every instance of A stands in the instance-level adjacent_to relation to some instance of B http://ncor.us
adjacent_to as a relation between universals is not symmetric • nucleus adjacent_to cytoplasm • Not: cytoplasm adjacent_to nucleus • seminal vesicle adjacent_to urinary bladder • Not: urinary bladderadjacent_to seminal vesicle http://ncor.us
The Granularity Gulf • most existing data-sources are of fixed, single granularity • many (all?) clinical phenomena cross granularities http://ncor.us
Main obstacle to integrating genetic and EHR data No facility for dealing with time and instances (particulars, individuals) in current ontologies http://ncor.us
Key idea • To define ontological relations like • part_of, develops_from • it is not enough to look just at universals / classes / types / ‘concepts’ : • we need also to take account of instances and time http://ncor.us
transformation_of • A transformation_of B • =def. any instance of A was at some earlier time an instance of B http://ncor.us
same instance C1 C c att c att1 time transformation_of mature RNA transformation_of pre-RNA adult transformation_of child carcinomatous colon transformation_of colon http://ncor.us
C1 C c att c att1 transformation_of relations cross both time and granularity http://ncor.us
Advantages of the methodology of enforcing commonly accepted coherent definitions • promote quality assurance (better coding) • guarantee automatic reasoning across ontologies and across data at different granularities • yields direct connection to times and instances in the EHR http://ncor.us