1.04k likes | 1.22k Views
The Unbearable Lightness of Biomedical Informatics. Barry Smith Saarbr ü cken/Buffalo http://ontologist.com. if Medical WordNet* is the solution. what is the problem? *Coling Proceedings, Vol. 1, pp. 371-380. Cerebellar tumor. Organism. Organ. Tissue. 10 -1 m. Cell. Organelle.
E N D
The Unbearable Lightness of Biomedical Informatics Barry Smith Saarbrücken/Buffalo http://ontologist.com
if Medical WordNet* is the solution • what is the problem? • *Coling Proceedings, Vol. 1, pp. 371-380
Organism Organ Tissue 10-1 m Cell Organelle 10-5 m Protein DNA 10-9 m
The quantity-quality divide • 30,000 genes in human • 200,000 proteins • 100s of cell types • 100,000s of disease types • 1,000,000s of biochemical pathways (including disease pathways) • … legacy of Human Genome Project • … and of attempts to institute the electronic health record
Organism Organ Tissue 10-1 m Cell Organelle 10-5 m Protein DNA 10-9 m
FUNCTIONAL GENOMICS • proteomics, • reactomics, • metabonomics, • toxicopharmacogenomics • phenomics, • behaviouromics, • …
Organism The method of annotations Organ Tissue 10-1 m Cell Organelle 10-5 m Protein DNA 10-9 m
Organism The method of indexing Organ Tissue 10-1 m Cell Organelle 10-5 m Protein DNA 10-9 m
The Gene Ontology • menopause • sensitivity to blue light • heptolysis
How overcome incompatibilities between different scientific index terms? • immunology genetics cell biology
One answer (statistical) computational linguistics Pattern recognition based on string searches
String searches need constraints • we can’t leave it to luck to overcome terminological incompatibilities
Remember –different disciplines are using different terminologies to refer to the same objects, processes, features in reality • immunology genetics cell biology
An alternative answer: • “Ontology”
Ontology, roughly: • Overcome terminological incompatibilities by creating a standardized framework into which diverse vocabularies can be mapped
Kinds of Ontologies ad hoc Hierarchies (Yahoo!) Description Logics (DAML+OIL) XML Schema structured Glossaries formal Taxonomies XML DTDs Terms Thesauri Data Models (UML, STEP) Principled, informalhierarchies ‘ordinary’ Glossaries Data Dictionaries (EDI) General Logic Frames (OKBC) DB Schema Glossaries & Data Dictionaries Thesauri, Taxonomies MetaData, XML Schemas, & Data Models Formal Ontologies & Inference Michael Gruninger
Two extremes meaning specified explicitly in a logically rigorous way Kinds of Ontologies • A shared vocabulary plus a specification of its intended meaning
Kinds of Ontologies ad hoc Hierarchies (Yahoo!) Description Logics (DAML+OIL) XML Schema structured Glossaries formal Taxonomies XML DTDs Terms Thesauri Data Models (UML, STEP) Principled, informalhierarchies ‘ordinary’ Glossaries Data Dictionaries (EDI) General Logic Frames (OKBC) DB Schema Glossaries & Data Dictionaries Thesauri, Taxonomies MetaData, XML Schemas, & Data Models Formal Ontologies & Inference
Too expensive meaning specified explicitly in a logically rigorous way Kinds of Ontologies • A shared vocabulary plus a specification of its intended meaning
Meaning specified informally via natural language Kinds of Ontologies • A shared vocabulary plus a specification of its intended meaning Two extremes
Work on biomedical ontologies grew out of work on medical thesauri and nomenclatures
Kinds of Ontologies ad hoc Hierarchies (Yahoo!) Description Logics (DAML+OIL) XML Schema structured Glossaries formal Taxonomies XML DTDs Terms Thesauri Data Models (UML, STEP) Principled, informalhierarchies ‘ordinary’ Glossaries Data Dictionaries (EDI) General Logic Frames (OKBC) DB Schema Glossaries & Data Dictionaries Thesauri, Taxonomies MetaData, XML Schemas, & Data Models Formal Ontologies & Inference
Fruit similarTo Vegetable Orange Apfelsine synonymWith NarrowerTerm Graph with labels edges (similarTo, Narrower, synonymWith) Fixed set of edge labels (a.k.a. relations) Goble & Shadbolt
Unified Medical Language System (UMLS) • UMLS Metathesaurus: • 1 million biomedical concepts • 2.8 million concept names • from more than 100 controlled vocabularies and classifications • built by US National Library of Medicine
UMLS Source Vocabularies • MeSH – Medical Subject Headings • … • ICD International Classification of Diseases • … • GO – Gene Ontology • … • FMA – Foundational Model of Anatomy • …
To reap the benefits of standardization • we need to make ONE SYSTEM out of many different terminologies • =UMLS “Semantic Network” • nearest thing to an “ontology” in the UMLS
UMLS SN • Alexa McCray, “An Upper Level Ontology for the Biomedical Domain”, Comparative and Functional Genomics, 4 (2003), 80-84.
UMLS SN • 134 Semantic Types • 54 types of edges (relations) • yielding a graph containing more than 6,000 edges
UMLS SN Top Level • entity event • physical conceptual • object entity • organism
conceptual entity • Organism Attribute • Finding • Idea or Concept • Occupation or Discipline • Organization • Group • Group Attribute • Intellectual Product • Language
conceptual entity • Organism Attribute • Finding • Idea or Concept • Occupation or Discipline • Organization • Group • Group Attribute • Intellectual Product • Language
Idea or Concept • Functional Concept • Qualitative Concept • Quantitative Concept • Spatial Concept • Body Location or Region • Body Space or Junction • Geographic Area • Molecular Sequence • Amino Acid Sequence • Carbohydrate Sequence • Nucleotide Sequence
Idea or Concept • Functional Concept • Qualitative Concept • Quantitative Concept • Spatial Concept • Body Location or Region • Body Space or Junction • Geographic Area • Molecular Sequence • Amino Acid Sequence • Carbohydrate Sequence • Nucleotide Sequence
Idea or Concept • Functional Concept • Qualitative Concept • Quantitative Concept • Spatial Concept • Body Location or Region • Body Space or Junction • Geographic Area • Molecular Sequence • Amino Acid Sequence • Carbohydrate Sequence • Nucleotide Sequence
Idea or Concept • Functional Concept • Qualitative Concept • Quantitative Concept • Spatial Concept • Body Location or Region • Body Space or Junction • Geographic Area • Molecular Sequence • Amino Acid Sequence • Carbohydrate Sequence • Nucleotide Sequence
Lake Geneva • is an Idea or Concept
Idea or Concept • Functional Concept • Qualitative Concept • Quantitative Concept • Spatial Concept • Body Location or Region • Body Space or Junction • Geographic Area • Molecular Sequence • Amino Acid Sequence • Carbohydrate Sequence • Nucleotide Sequence
UMLS • Fingers is_a Body Location or Region • Hand is_a Body Part, Organ, or Organ Component • hand part_of body • BUT NOT • fingers part_of hand
Problem: Running together of concepts and entities in reality bioinformatics à la UMLS SN ( like many “knowledge engineering” disciplines ) floats free from reality in a conceptual world of its own creation
Blood Pressure Ontology • The hydraulic equation: • BP = CO*PVR • arterial blood pressure (BP) is directly proportional to the product of blood flow (cardiac output, CO) and peripheral vascular resistance (PVR).
UMLS SN • blood pressure is an Organism Function • cardiac output is a Laboratory or Test Result or Diagnostic Procedure
BP = CO*PVR thus asserts that • blood pressure is proportional either to a laboratory or test result or to a diagnostic procedure
Problem: Confusion of reality with our (ways of gaining) knowledge about reality
UMLS Semantic Network • entity • physical conceptual • object entity