1 / 104

The Unbearable Lightness of Biomedical Informatics

The Unbearable Lightness of Biomedical Informatics. Barry Smith Saarbr ü cken/Buffalo http://ontologist.com. if Medical WordNet* is the solution. what is the problem? *Coling Proceedings, Vol. 1, pp. 371-380. Cerebellar tumor. Organism. Organ. Tissue. 10 -1 m. Cell. Organelle.

finian
Download Presentation

The Unbearable Lightness of Biomedical Informatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Unbearable Lightness of Biomedical Informatics Barry Smith Saarbrücken/Buffalo http://ontologist.com

  2. if Medical WordNet* is the solution • what is the problem? • *Coling Proceedings, Vol. 1, pp. 371-380

  3. Cerebellar tumor

  4. Organism Organ Tissue 10-1 m Cell Organelle 10-5 m Protein DNA 10-9 m

  5. The quantity-quality divide • 30,000 genes in human • 200,000 proteins • 100s of cell types • 100,000s of disease types • 1,000,000s of biochemical pathways (including disease pathways) • … legacy of Human Genome Project • … and of attempts to institute the electronic health record

  6. Organism Organ Tissue 10-1 m Cell Organelle 10-5 m Protein DNA 10-9 m

  7. FUNCTIONAL GENOMICS • proteomics, • reactomics, • metabonomics, • toxicopharmacogenomics • phenomics, • behaviouromics, • …

  8. Organism The method of annotations Organ Tissue 10-1 m Cell Organelle 10-5 m Protein DNA 10-9 m

  9. Organism The method of indexing Organ Tissue 10-1 m Cell Organelle 10-5 m Protein DNA 10-9 m

  10. The Gene Ontology • menopause • sensitivity to blue light • heptolysis

  11. How overcome incompatibilities between different scientific index terms? • immunology genetics cell biology

  12. One answer (statistical) computational linguistics Pattern recognition based on string searches

  13. String searches need constraints • we can’t leave it to luck to overcome terminological incompatibilities

  14. Remember –different disciplines are using different terminologies to refer to the same objects, processes, features in reality • immunology genetics cell biology

  15. An alternative answer: • “Ontology”

  16. Ontology, roughly: • Overcome terminological incompatibilities by creating a standardized framework into which diverse vocabularies can be mapped

  17. Kinds of Ontologies ad hoc Hierarchies (Yahoo!) Description Logics (DAML+OIL) XML Schema structured Glossaries formal Taxonomies XML DTDs Terms Thesauri Data Models (UML, STEP) Principled, informalhierarchies ‘ordinary’ Glossaries Data Dictionaries (EDI) General Logic Frames (OKBC) DB Schema Glossaries & Data Dictionaries Thesauri, Taxonomies MetaData, XML Schemas, & Data Models Formal Ontologies & Inference Michael Gruninger

  18. Two extremes meaning specified explicitly in a logically rigorous way Kinds of Ontologies • A shared vocabulary plus a specification of its intended meaning

  19. Kinds of Ontologies ad hoc Hierarchies (Yahoo!) Description Logics (DAML+OIL) XML Schema structured Glossaries formal Taxonomies XML DTDs Terms Thesauri Data Models (UML, STEP) Principled, informalhierarchies ‘ordinary’ Glossaries Data Dictionaries (EDI) General Logic Frames (OKBC) DB Schema Glossaries & Data Dictionaries Thesauri, Taxonomies MetaData, XML Schemas, & Data Models Formal Ontologies & Inference

  20. Too expensive meaning specified explicitly in a logically rigorous way Kinds of Ontologies • A shared vocabulary plus a specification of its intended meaning

  21. Meaning specified informally via natural language Kinds of Ontologies • A shared vocabulary plus a specification of its intended meaning Two extremes

  22. Work on biomedical ontologies grew out of work on medical thesauri and nomenclatures

  23. Kinds of Ontologies ad hoc Hierarchies (Yahoo!) Description Logics (DAML+OIL) XML Schema structured Glossaries formal Taxonomies XML DTDs Terms Thesauri Data Models (UML, STEP) Principled, informalhierarchies ‘ordinary’ Glossaries Data Dictionaries (EDI) General Logic Frames (OKBC) DB Schema Glossaries & Data Dictionaries Thesauri, Taxonomies MetaData, XML Schemas, & Data Models Formal Ontologies & Inference

  24. Fruit similarTo Vegetable Orange Apfelsine synonymWith NarrowerTerm Graph with labels edges (similarTo, Narrower, synonymWith) Fixed set of edge labels (a.k.a. relations) Goble & Shadbolt

  25. Unified Medical Language System (UMLS) • UMLS Metathesaurus: • 1 million biomedical concepts • 2.8 million concept names • from more than 100 controlled vocabularies and classifications • built by US National Library of Medicine

  26. UMLS Source Vocabularies • MeSH – Medical Subject Headings • … • ICD International Classification of Diseases • … • GO – Gene Ontology • … • FMA – Foundational Model of Anatomy • …

  27. To reap the benefits of standardization • we need to make ONE SYSTEM out of many different terminologies • =UMLS “Semantic Network” • nearest thing to an “ontology” in the UMLS

  28. UMLS SN • Alexa McCray, “An Upper Level Ontology for the Biomedical Domain”, Comparative and Functional Genomics, 4 (2003), 80-84.

  29. UMLS SN • 134 Semantic Types • 54 types of edges (relations) • yielding a graph containing more than 6,000 edges

  30. Fragment of UMLS SN

  31. UMLS SN Top Level • entity event • physical conceptual • object entity • organism

  32. conceptual entity • Organism Attribute • Finding • Idea or Concept • Occupation or Discipline • Organization • Group • Group Attribute • Intellectual Product • Language

  33. conceptual entity • Organism Attribute • Finding • Idea or Concept • Occupation or Discipline • Organization • Group • Group Attribute • Intellectual Product • Language

  34. Idea or Concept • Functional Concept • Qualitative Concept • Quantitative Concept • Spatial Concept • Body Location or Region • Body Space or Junction • Geographic Area • Molecular Sequence • Amino Acid Sequence • Carbohydrate Sequence • Nucleotide Sequence

  35. Idea or Concept • Functional Concept • Qualitative Concept • Quantitative Concept • Spatial Concept • Body Location or Region • Body Space or Junction • Geographic Area • Molecular Sequence • Amino Acid Sequence • Carbohydrate Sequence • Nucleotide Sequence

  36. Idea or Concept • Functional Concept • Qualitative Concept • Quantitative Concept • Spatial Concept • Body Location or Region • Body Space or Junction • Geographic Area • Molecular Sequence • Amino Acid Sequence • Carbohydrate Sequence • Nucleotide Sequence

  37. Idea or Concept • Functional Concept • Qualitative Concept • Quantitative Concept • Spatial Concept • Body Location or Region • Body Space or Junction • Geographic Area • Molecular Sequence • Amino Acid Sequence • Carbohydrate Sequence • Nucleotide Sequence

  38. Lake Geneva • is an Idea or Concept

  39. Idea or Concept • Functional Concept • Qualitative Concept • Quantitative Concept • Spatial Concept • Body Location or Region • Body Space or Junction • Geographic Area • Molecular Sequence • Amino Acid Sequence • Carbohydrate Sequence • Nucleotide Sequence

  40. UMLS • Fingers is_a Body Location or Region • Hand is_a Body Part, Organ, or Organ Component • hand part_of body • BUT NOT • fingers part_of hand

  41. Problem: Running together of concepts and entities in reality bioinformatics à la UMLS SN ( like many “knowledge engineering” disciplines ) floats free from reality in a conceptual world of its own creation

  42. Blood Pressure Ontology • The hydraulic equation: • BP = CO*PVR • arterial blood pressure (BP) is directly proportional to the product of blood flow (cardiac output, CO) and peripheral vascular resistance (PVR).

  43. UMLS SN • blood pressure is an Organism Function • cardiac output is a Laboratory or Test Result or Diagnostic Procedure

  44. BP = CO*PVR thus asserts that • blood pressure is proportional either to a laboratory or test result or to a diagnostic procedure

  45. Problem: Confusion of reality with our (ways of gaining) knowledge about reality

  46. UMLS Semantic Network • entity • physical conceptual • object entity

More Related