400 likes | 536 Views
Scale and Context: Issues in Ontologies to link Health- and Bio-Informatics. Alan Rector, Jeremy Rogers, Angus Roberts, Chris Wroe Bio and Health Informatics Forum/ Medical Informatics Group Department of Computer Science, University of Manchester
E N D
Scale and Context: Issues in Ontologies to link Health- and Bio-Informatics Alan Rector, Jeremy Rogers, Angus Roberts, Chris WroeBio and Health Informatics Forum/Medical Informatics GroupDepartment of Computer Science, University of Manchester rector@cs.man.ac.ukwww.cs.man.ac.uk/mig img.man.ac.ukwww.clinical-escience.orgwww.opengalen.org
Organisation of Talk • Informal presentation, motivation & examples • Intro to logic based ontologies • How to use logic based ontologies to represent scales and context • Making context modular – normalisation • Recurrent distinctions • and tests for those distinctions • Making logic based ontologies usable • Views and Intermediate Representations • Summary
Example Problems of Context • Classification by multiple axes • e.g. Molecular action, physiologic, and pathological effects • Chloride transport & Cystic fibrosis • Biological Scope • eg. Normal/Abnormal, Human/Mouse • Conceptual view • e.g. the Digital Anatomist Foundational Model of organs vs Clinical convention –Is the pericardium a part of the heart?
Basic Approach • Separate information into independent modules • Normalise the ontology • “The truth, the whole truth, and nothing but the truth” • Add explicit contextual information • Don’t distort the structure • Add context to it explicitly
Why use Logic-based Ontologies? becauseKnowledge is Fractal! &Requirements are Diverse Coherence without Uniformity!
hand extremity body Lung inflammation infection abnormal normal Logic-based Ontologies: Conceptual Lego gene protein cell expression chronic acute bacterial deletion polymorphism ischaemic
Logic-based Ontologies: Conceptual Lego “SNPolymorphism of CFTRGene causing Defect in MembraneTransport of ChlorideIon causing Increase in Viscosity of Mucus in CysticFibrosis…” “Hand which isanatomicallynormal”
Logic based ontologies • A formalisation of semantic nets, frame systems, and object hierarchies via KL-ONE and KRL • “is-kind-of” = “implies” (“logical subsumption”) • “Dog is a kind of wolf” means“All dogs are wolves” • Modern examples: DAML+OIL /“OWL”?) • Older variants LOOM, CLASSIC, BACK, GRAIL, K-REP, …
Feature Structure Thing + feature: pathological red pathological Heart MitralValve MitralValve * ALWAYS partOf: Heart Encrustation * ALWAYS feature: pathological Encrustation Structure + feature: pathological + involves: Heart Encrustation + involves: MitralValve Logic Based Ontologies: The basics Validating (constraining cross products) Primitives Descriptions Definitions Reasoning Thing red + partOf: Heart red + partOf: Heart + (feature: pathological)
Bridging Bio and Health Informatics • Define concepts with ‘pieces’ from different scales and disciplines and then combine them • “Polymorphism which causes defect which causes disease” • Use concepts which make context explicit • “ ‘Hand which is anatomically normal’ has five fingers”“ ‘Normal human prostate’ has three lobes” • Use different subproperties for different contexts • “Abnormalities of clinical parts of the heart”
Protein CFTRGene in humans Membrane transport mediated by (Protein coded by (CFTRgene in humans)) Protein coded by(CFTRgene & in humans) Disease caused by (abnormality in (Membrane transport mediated by (Protein coded by (CTFR gene & in humans)))) Bridging Scales with Ontologies Species Genes Function Disease
Use composition to express context • Normal and abnormal Hand isSubdivisionOf some UpperExtremity Hand & AnatomicallyNormal hasSubdivision exactly-5 fingers • Homologies and Orthologies Thumb of Hand of Human hasFeature Opposable Thumb of Hand of NonHumanPrimate ¬hasFeature Opposable
mammal Body mammal some Prostate Body male =3 =1 human Body Prostate Lobe male L1 L2 L3 =5 mouse Body Prostate male P1 P2 P3 P4 P5 More detailed example Body
Disease of part_of Heart is_part_of OrganPart Organ Heart CardiacValve Pericardium Disease of Pericardium is_clinically_part_of Represent context and views by variant properties is_structurally_part_of
What we want to avoid: combinatorial explosions • The “Exploding Bicycle”From “phrase book” to “dictionary + grammar” • 1980 - ICD-9 (E826) 8 • 1990 - READ-2 (T30..) 81 • 1995 - READ-3 87 • 1996 - ICD-10 (V10-19 Australian) 587 • V31.22 Occupant of three-wheeled motor vehicle injured in collision with pedal cycle, person on outside of vehicle, nontraffic accident, while working for income • and meanwhile elsewhere in ICD-10 • W65.40 Drowning and submersion while in bath-tub, street and highway, while engaged in sports activity • X35.44 Victim of volcanic eruption, street and highway, while resting, sleeping, eating or engaging in other vital activities
Structure Function Structure Part-whole Part-whole Function The Cost 1: Normalising (untangling) Ontologies
… ActionRole PhysiologicRole HormoneRole CatalystRole … … Substance BodySubstance Protein Insulin Steroid … The Cost 1: Normalising (untangling) OntologiesMaking each meaning explicit and separate PhysSubstance Protein ProteinHormone Insulin Enzyme Steroid SteroidHormone Hormone ProteinHormone^ Insulin^ SteroidHormone^ Catalyst Enzyme^ PhysSubstance Protein‘ ProteinHormone’ Insulin‘Enzyme’ Steroid‘SteroidHormone’ ‘Hormone’ ‘ProteinHormone’ Insulin^‘SteroidHormone’ ‘Catalyst’‘Enzyme’ …build it all by combining simple trees Hormone = Substance & playsRole-HormoneRole ProteinHormone = Protein & playsRole-HormoneRoleSteroidHormone = Steroid & playsRole-HormoneRole Catalyst = Substance & playsRole CatalystRole Insulin playsRole HormoneRole Enzyme ?=? Protein & playsRole-CatalystRole
NormalisationBuilding ontologies from orthogonal trees • Each tree is homogeneous and based on subsumption • One prinicple – one of function, structure, cause,… • Every primitive has exactly 1 primitive parent • All multiple classification done by the logic • All self-standing primitives disjoint
The Cost: 2 – Clean Distinctions & Tests • Repeating patterns within levels • Structures vs Substances • Flavours of part-whole • Part-whole vs containment, connection, branching • Process/Event vs Thing (“Endurant” vs “Perdurant”) • … • Repeating patterns across levels • Multiples at one level act as substances at the next • Substances span levels; structures are specific to a level
Repeating Patterns within each level • Structures vs Substances (Discrete vs Mass) • Structures are made of substances • Organs are made of tissue • Parts & portions • Structures have parts & subdivisions,… • Substances have portions • Portions can have proportions & concentrations
Tests • Structures (Discrete) • Can you count it? Is one part different from another? Is it made of something(s)? • Books, organs, ideas, individual cells, organisations, … • Substance (Mass) • Are all bits the same? Can something be made of it? Can you talk about “A piece of it”? “A lump of it”? “A stream of it”? … • Water, sodium, tissue, blood, …
Repeating Patterns within each level • Part-whole vs containment • Parthood is organisational • The wall is part of the cell; • The cornea is part of the eye • Containment is physical • The inclusion is contained in the cell • The marrow is contained in the bone • Often occur together • Nucleus is a part of and contained in the cell • The retina is part of and contained in the eye
Tests • Parts • If I take the part away, is the whole incomplete? • If the part is damaged is the whole damaged? • If I do something to the part do I do something to the whole? • Containment • Is the contained thing inside the container? • Is the relationship spatial/physical? (or temporal?)
Repeating Patterns bridging levels • Multiples of structures at one level behave as substances at the next • “Blood is made of in part a multiple of red cells”“Tissue is made of in part a multipleof cells”“A rash is a multiple of spots”“Polyposis is a multiple of polyps”“A flock is a multiple of birds” • Multiples are not Sets • Not defined by members • Membership can change (intensional rather than extensional) • Action on the singleton is not action on the multiple;Action on the whole is (usually) action on the singletons • If I treat a spot, I do not treat the rash • If I treat the rash, I treat the spots
Tests • Multiples • Name for the singleton – “grain”, “cell”, “bird”? • Singletons are countable? • Multiple is measurable rather than countable? • Odd to say part-of “This cell is part of the Arm”?
But make it simple • Intermediate representations and views • OWL + Detailed Schema is the Assembler Language • FaCT/SHIQ/… is the machine code • Almost no one writes in assembler • let alone machine code • Separate “terms” and “concepts” • Language/labels from concepts
Tools Versioning Language Metadata Intermed Rep Linksto Resources Indexed KB (Frame Like) Provenance Layered Architecture Protégé +“OilEd-II”+ …? DL
Example:An Intermediate Representation for Surgery "Open fixation of a fracture of the neck of the left femur" MAIN fixing ACTS_ON fracture HAS_LOCATION neck of long bone IS_PART_OF femur HAS_LATERALITY left HAS_APPROACH open
The formal “assembler” version (‘SurgicalProcess’ which isMainlyCharacterisedBy (performance which isEnactmentOf (‘SurgicalFixing’ which hasSpecificSubprocess (‘SurgicalAccessing’ hasSurgicalOpenClosedness (SurgicalOpenClosedness which hasAbsoluteState surgicallyOpen)) actsSpecificallyOn (PathologicalBodyStructure which < involves Bone hasUniqueAssociatedProcess FracturingProcess hasSpecificLocation (Collumwhich isSpecificSolidDivisionOf (Femurwhich hasLeftRightSelectorleftSelection))>))))
Result • Training time: 3 mo 3 days + 3 days • Productivity: 25/day 100/day • Central reconciliation: 50%+ 10% • Local cycle time: 3 months <1 week • “Dependencies” High Low • Author satisfaction: Low High • Disputes: Frequent Rare • Repeatability: Low High Even Pre Web!
Navigation vs Retrieval/Reference“Access terminology” & “Reference terminology” • Access follows model of use • e.g. MeSH, MEDCin • Hierarchy is what is needed next “to hand” • People find easy; Software hard • Retrieval follows model of meaning • Logic based ontologies • Hierarchy means “is-kind-of” / subsumption • People may find odd; Software is easy • Need Both - & visualisations of both • The logic based structure isn’t enough • Views and intermediate representations
What’s in a View/ Intermediate Representation? Language linguisticgeneration &search User Oriented Structures semantictransformations & Filters Explicit Context in Ontology “Assembler”
SummaryLet the logic engine do the work • Logic based ontologies can bridge granularities & represent context explicitly • And manage the potential combinatorial explosions • To do so • Views and Interface – usable, flexible & easy to learn • Entry, Navigation, & Use are different • Structure – explicit & modular – “Normalised” • Conception – clean testable distinctions • Tools & Architecture - layered & comprehensive • The logic is the assembly language
Some Healthcare Terminologies • ICD 9/10 • Traditional paper thesauri • -CM versions essential for billing (and –AM) • CPT – Clinical Procedure Terminology • “Simple” list • Clinical Terms (Read Codes) V2 • Simple hierarchy • Still dominant in UK general practice • SNOMED-CT • At least “logic assisted” • Political questions… • NCI Cancer Ontology • “Logic based in parts” – work in progress
Others • Standards Related • Loinc – laboratory data • Increasingly structured – “logic assisted” aspirations • HL7 Vocabulary TC • Specialised vocabularies – Inspiration for OHT • Links to RxNorm • Snomed Dicom Microglossary (SDM) • Image related information – not related tNOMED • Open Source • OpenGALEN Common Reference Model • Logic based – multilingual – a resource rather than a terminology • Basis of UK Drug Ontology • Open Health Terminology • Watch this space • Focusing on UMLS • Likely to be at least “logic assisted”
Special Purpose • Anatomy • Digital Anatomist Foundational Model of AnatomyFMA • Principled frame based representation • Superb reference point for structural anatomy • Needs functional and clinical supplements • http://sig.biostr.washington.edu/projects/da/ • Drugs • RxNorm and VA projects • See Steve Brown & Stuart Nelson • UK Primary Care Drug DictionaryUKCPRS (Secondary Care)Drug Ontology (OpenGALEN based) • MEDDRA, FDA, Proprietary, …, …, …
Unified Medical Language System (UMLS) • Common reference point and link to MeSH Terms and literature • De facto standard for universal identifiers • Concept Unique Identifiers (CUIs) • Lexical Unique Identifiers (LUIs) • String Unique Identifiers (SUIs) • Valuable in itself:Huge resource for mining and restructuring • Udo Hahn and Stefan Schulz“CoMMeT – Conceptual Model of Medical Terminology • http://www.coling.uni-freiburg.de/pub/schulz/commet/ • Alexa McCray is speaking next