570 likes | 746 Views
Knowledge Driven Software & “Fractal Tailoring”: Ontologies in Development Environments for Clinical Systems Where are the boundaries of ontology? How to get back where we were in 1985?. Alan Rector School of Computer Science, University of Manchester rector@cs.manchester.ac.uk
E N D
Knowledge Driven Software & “Fractal Tailoring”:Ontologies in Development Environments for Clinical SystemsWhere are the boundaries of ontology?How to get back where we were in 1985? • Alan Rector • School of Computer Science, University of Manchesterrector@cs.manchester.ac.uk • http://www.cs.manchester.ac.uk/~rector
Background: A common set of problems observed in implementing practical systems • Two industrial projects & one major research application using Ontologies & OWL with software • Documentation & order entry software • Make the right information / forms / widgets available at the right time tailored to patient, task and setting • Minimize cognitive overload • Links to many medical terminologies • ICD, SNOMED, Ontology for Clinical Research and other statistical ontologies, National Center for Biomedical Ontology. GALEN, … • Work on Protégé-OWL, OWL, and related formalisms • And living in a hot-bed of DL experts • Experiments with NHS National Programme for IT standards & specifications • Plus a question from one of the most prominent and successful researchers in Health Informatics • Why can’t we get back to 1985? (hands thrown up in despair) • Zak Kohane: Harvard based, MIT trained in AI & CS, amongst most experienced & successful NIH bioinformatics and translational medicine researchers
… and the obvious observation:Ontologies have had little impact on software: • Artifacts called “Ontologies” have become ubiquitous • “Everybody thinks they need one” – sometimes just a synonym for good • …but…(outside this room) • Ontological methods and OWL remain niche markets • Compare to “Model Driven Architecture” (MDA) • Where are the “Ontology Driven Architectures” • Constant questions about how OWL relates to UML • … and few helpful responses • Constant queries about relevance of ontologies • Minimal impact outside of “annotation” • And limited there • compare with XML, UML, and even RDF(S) Why?
Plan of this talk • Where I come from • And my slant on the history of KR and Ontologies in Information Systems • Our use cases • And why clinical systems are hard • Architecture issues • Dual use of ontologies and “Fractal tailoring” • Ontologies, data structures, and user interfaces • Knowledge issues • Language • Generic Knowledge Representation & Contingent Knowledge • Ontology issues • Ontology issues that do (and don’t) matter to clinical systems • What’s in a code • Metadata, Annotations, Higher order representations • Evidence for choosing options, evaluation and quality assurance • Summary of requirements and issue
Clinical Terminology Data Entry Clinical Record Clinical research &Decision Support Data Entry Electronic Health Records Clinical researchDecision Support &Knowledge Presentation Healthcare Mr Ivor Bigun Dun Roamin Anytown Any country 4431 3654 90273 GALENOntologies &Descriptionlogics Where I come from Best Practice Best Practice
Guidelines, Patterns, Tools, Views, Transformations `` … and a long struggle with: Poor fit between problem & solution spaces Problem space Solutionspace
Three guiding principles • The user is always right… but the user is usually wrong • about the problem space – the problems they have • about the solution space – how to fix them • There is no one way! • But there are wrong ways • Enumeration does not scale • Medicine is a field of niches • Easily lead to combinatorial explosions
… and pragmatic software development Clinergy/ Pen&PAD (1997)
>50K potential forms and subforms From a tiny KB based on a normalised ontology
Predicted Actual The scaling problem: The combinatorial explosion • It keeps happening! • “Simple” brute force solutions do not scale up! • Conditions anatomy modifiers task setting user type … • Huge number of niches Terms to author / Data structures to specify / GUI Screens to construct • Software CHAOS • Massive indexing • Massive task for quality assurance
The (combinatorially) exploding bicycle(codes for injuries involving cyclists) • 1972 ICD-9 (E826) 8 • READ-2 (T30..) 81 • READ-3 87 • 1999 ICD-10 …… ICD = International Classification of diseases
1999 ICD10: 587 codes • V31.22 Occupant of three-wheeled motor vehicle injured in collision with pedal cycle, person on outside of vehicle, nontraffic accident, while working for income • W65.40 Drowning and submersion while in bath-tub, street and highway, while engaged in sports activity • X35.44 Victim of volcanic eruption, street and highway, while resting, sleeping, eating or engaging in other vital activities
abnormal hand normal extremity body Beating the Combinatorial Explosion with “Conceptual Lego” gene protein polysacharide cell expression chronic Lung acute infection inflammation bacterium deletion polymorphism ischaemic virus mucus
A grammar rather than a phrase bookComposition rather than enumeration “SNPolymorphismofCFTRGene causing Defect in MembraneTransport of Chloride Ion causing Increase in Viscosity of Mucus in CysticFibrosis…” “Handwhich isanatomically normal”
Protein Disease caused by abnormality inFunction ofProtein coded bygene in humans Protein coded bygene in humans Function ofProtein coded bygene in humans Gene in humans Normalisation &Modularisation:Building complex representations from modularisedprimitives – Reasoner as terminonologyCompiler Species Genes Function Disease
Our use cases and what role ontologies and reasoners play in solving them
In general, Why might one want a “Knowledge Driven Architecture”? • Consistency of vocabulary and meaning • Controlled vocabulary and ID management • Composition of new entities from old (“post coordination”) • Adaptability & context sensitivity • Dynamic extension of data structures • Scalable maintenance and localisation • Common context-sensitive index for all resources • Transparent declarative representation • Logical Organisation, indexing & Consistency checking • Changes made declaratively in exactly one place with predictable consequences
Fundamental Approach:Dual use of ontologies • Contentof the information systemWhat is carried by the data structures • The model of meaning for the information • What can be said and when • Classifier acts as a “Terminology Compiler” • Indexto the information system:Which data structures / UIs / Procedures to use whenBrachman’s “conceptual coatrack” • “Fractal tailoring” & dynamic assembly based on context • Context may include setting, task, user, etc. • Classifier acts as an “Index Compiler” • for components to assemble
Idiopathic Hypertensionin our co’s Phase 2 study Age Adult Dual use for ontologies: Indexing & content Hypertension Hypertension Idiopathic Hypertension Idiopathic Hypertension` In our company’s studies In our company’s studies In Phase 2 studies In Phase 2 studies
Separation of Lexicon / Language and Symbolic Model of “Ontology” • “Ontologies” are symbolic models implemented in software / logic • Behaviour unchanged under any consistent relabelling • Good practice is that every entity have a clear linguistic description as well as a label • But the descriptions and labels do not affect behaviour of the ontology in software • Language and Logic use different principles • Lexicons are about word usage, grammar, etc. • Many ways to express the same symbolic expression in language • Synonymy, polysemy, metonymy etc. are linguistic phenomena and language specific • Linguists and ontologists have surprising difficulty understanding each other • Research on relation between language and formal ontology still limited despite • John Bateman, Pelletier • MONNET project (UPM Asun Gomez Perez, Oscar Corho, …) • Udo Hahn et al. • SWAT project: Donia Scott, Richard Power, … • Cyc’s work on language understanding
Language is often misleading wrt model &Arguments about words tend to dogmatism • Misleading • “Heart valve” ≠ “valve located the heart” • Means one of the four “great heart valves” • “Cardiomyopathy” ≠ “Disorder of cardiac muscle” • Otherwise a “Myocardial Infarction” would be a “Cardiomyopathy”. • Words lead to more dogmatic arguments than substance • “Does ‘neoplastic’ imply ‘malignant’?” • bitter controversy • “Do we need expressions for all of…? - “Benign tumour” - “Malignant tumour” - “Tumour, benign or malignant” • No controversy • Separating substance and labelling can reduce meeting time by 75%!
Language Generation for ontologies • How does an ordinary user understand a complex expression in an ontology • Procedure that includes (Removal that has_target some Appendix and occurs_in some (Situation that includes some (Inflammation that has_locus some Peritoneum ))) • “Appendicectomy in the presence of Peritonitis” or “Appendectomy with co-occuring Peritonitis” or … • Disorder that has_locus some Great_heart_valve • “Disorder of heart valve”, or“Valvular heart disease”, or… • Powerful mechanism for QA • GALEN used extensively for definitions that often ran to 20 lines or more
In generated English In original languages Language generation & Multilingual ontologies
General background knowledge:The flesh on the ontological skeleton • Knowledge driven systems require more background knowledge than just “ontologies” • “Ontologies” are about what is universally true • Almost the only thing that ‘ontologists can agree on • Otherwise “ontology” is just a synonym for “logical theory” • In all ontological formalisms, all statements begin with “for all…” • Implied in DLs • Explicit in predicate logic based formulations • Expressed as lambda abstraction in Conceptual Graphs • Generally true in frames, but ambiguous • Much (most) background knowledge is contingent:i.e. of the forms: • “Generally…”, “Typically…”, “… may …” • Or Conventional facts
The issue of “may” • “What may cause pneumonia?” • Conventional answer (e.g. on a med school exam): • Bacteria, Virus, Yeast, Fungus, … • Drill down and you get specific lists, e.g. • Bacteria Pneumococci, Haemophilus, Staphylococcus, … • Virus Respiratory syncitial virus, … • But • Not all pneumococci cause pneumonia • … or even have the “disposition” to cause pneumonia • There are other things that can cause pneumonia • E.g. Pneumocystis in immunosuppressed patients • Indeed nearly any micro-organism in weird enough circumstances • Biology is rarely absolute! Almost never exhaustive! Rarely even mutually exclusive
“May” and “Typically” Characteristics of “May” statements • Reciprocal • If “A may cause B”, then “B may be caused by A”. • Some alternative FoL approximations • xy . A(x) & B(y) & causes(x,y) • A(x) & B(y) & causes(x,y) is satisfiable • There is a subclass of causal associations CAB such that(all) CAB has_topic some A & has_target some B. • (pun) Class A causally_associated_with value (pun) Class B • (all) A may_cause some B;(all) B may_be_caused_by some A;cause may_cause • Metalogical / procedural / uncertainty Implicatures: The mentioned entities are “distinguished” • When a text book says: “Pneumonia may be caused by Bacteria, Virus, …”it means more thanxy . Pneumonia(x) & Bacteria(y) & causes(x,y)
Many sources of contingent knowledge, e.g.Statistical co-occurence The nodes may Come from anontology; The contingent links do not. From http://barabasilab.neu.edu/projects/hudine/
Associations by Common Metabolic Pathway From D-.S Lee et al. 2008
Characteristics of “typical” statements • Not reciprocal • “A typically causes B” does not imply that “B is typically caused by A”, or visa versa • No first order interpretation • “Defeasible logics still a research topic - not yet (if ever) for implementation • Inherited down the left side with exceptions • If A typically has C, then unless otherwise specified it is typical of subclasses of A • Metalogical: procedural / uncertainty • Around notions like normative, statistical, probabilistic • Historically, the key function of “frames” • The precursors of modern ontology systems
(typically)serious contraindication beta blocker asthma use of beta blocker in asthma (exception)mild use of cardioselectivebeta blocker in asthma contraindication cardioselective cardioselectivebeta blocker Experience: Normalised ontologies lead to clean default inheritance
Facts: A-boxes vs Databases • Much of medical knowledge is just data • Product A is licensed for Condition B • The clinics in this hospital are… • The specialists eligible to perform X are … • The allowed values for this field are… • The proformas for this condition are … • The services available to perform this analysis are… • Questions are closed world • Negation as failure • Even if they may use the classification hierarchy as a framework • A common pattern is an open world ontology as schema for a closed world data base • But rarely supported by tools or formalism • “Conjunctive queries” come close, but those still involve an A-Box. • Querying of an RDF store according to OWL should be easy but isn’t • OWL to UML consistency testing understood, but not always the point
Procedural Knowledge, e.g. • Prototypical sequences • Plans and partial plans • Calculations & attached procedures • Links to external services • Service Oriented Architectures • Workflows & Business Process Rules • Helpful to index via “ontology”;Unhelpful to try to represent in an “ontology” From http://www.mapofmedicine.com
Summary of architectural issues • Dual role for ontologies • Content • Indexing • Inheritance with defaults works for normalised “ontologies” bases • gives a basis for “fractal tailoring” • … but “ontologies” are only a small part of the knowledge required for knowledge driving systems
“Ontological Issues” – Intended & unintended consequences of borrowing a word (a metaphor) • Useful, not so useful, and counterproductive “ontological” notions & dogmas • User oriented views, transformations and intermediate representations • Issues too often ignored and need further research • Linking ontologies to information systems • What’s in a code?
Example useful “ontological” distinctions • “Kind” and “Role” • E.g. “Diagnosis” and “Conditions”; “Evidence” and “Observation” • “Parthood” and “containment” • E.g. “Brain” and “Skull” • “Mode” and “Modifier” (“Generic Dependent” vs “Quality”) • e.g. “Family history of X” and “severe X” • “Observation”: • “method” vs “result” vs “observed” vs “copy of data” • …
Where “ontology” / logic can help, e.g.The “equivalence problem” in SNOMED • Almost all attribute-value pairs can be transformed into (quasi) independent entities • A patient has a “haemoglobin” that is “elevated” • Iff • The patient has “elevated hemoglobin”
Formally (in OWL): Method 1 • An equivalence • “Has some (Hemolglobin that has_interpreatation some Elevated)” == • “has some (Elevated that is_interpretation_of some Hemoglobin)” • given the axioms: • has o has_interpretation hashas o is_interpretation_of has • Is_interpretation_of == inv(has_interpretation) • …while maintaining: • Haemoglobin that has_interpretatation some Elevated =\= Elevated that is_interpretation_of some Haemoglobin
Alternative: “creative ambiguity” in formalism:…analogous to “role groups” in SNOMED • “Patient with a fracture of the leg” equivalent to “Patient with a leg that is fractured” • Disorder that has_morphology some Fracture & has_locus some Leg • Potential advantage of regularity for software with simpler reasoners • Measurement that has_target some Haemoglobin & has_interpretation some Elevated
The issue of Context: Large of elephant vs Large mouse • OWL and related languages give a useful solution • Use defined classes and classifier to define and organise consistent hierarchies of context, e.g. • has_mass range Mass_quantity • Criterion for some (Large that is_size_of some Elephant) has_mass some [>= 3500Kg] • Criterion for some (Large that is_size_of some Mouse) has_mass some [>= 30gms] • Equally important for selection of data structures, procedures, … …
Conflict between clinical and ontological usage: Why should ontologists claim monopoly on correct use of words? • Example: parthood • Medical usage does not follow mereological theory • Best modelled by a different relation: language is ambiguous • “The thyroid is part of the endocrine system” is a matter of function rather than physical connectedness • “faults in parts are faults in the whole” comes closer to clinical intuition • FMA driven to “Sets of heterogeneous structures” as a kluge& the “immune system” has no parts! • Lack of functional information is major limitation on the use of the FMA • The way we bridge the gap by a hierarchy of parthood relations • Clinical_part Functional_part Physical_part …
Example 2: Clinical distinctions that cut across ontological distinctions :Example - found in almost all clinical systems • “Observables” • Attribute-value pairs, e.g. • Serum haemoglobin has_measure value 13mg% • has intepretation some Normal” • vs • “Findings” • Things normally absent that may be present (or vice versa), e.g. • Lump, tumour, diabetes, elevated temperature, fever, • hassome Diabetesnot has some Diabetes
… but ontological categories mixed users require alternative views • “Observables” may be • Qualities of • the body, of parts of the body, of functions, of roles, of processes, etc • Relations to independent entities – • e.g. “site of radiation of pain” • … • “Findings” may be • Independent • Generic dependent • Reified values of interpretations • Reified relations • …
Bridging the gap • Provide alternative organisations of the ontology • Let the classifier do the work • But requires strict logical consistency & users’ intuitions are not always strictly logical • Provide separate user organisation • Separate browsing / searching layer overlayed on ontology • Thesauri and SKOS seem the natural candidates • Systematic transformations between thesauri and ontologies a critical research area
Ontological dogma counterproductive for clinical systemsPotential epistemic status is fundamental to medical reporting and reasoning • Most clinical systems distinguish at least two of: • Observation: “Serum Haemoglobin = 7 mg%” • Interpretation: “Serum Haemoglobin ~ low” • Belief (diagnosis): “Anaemia” • Ignoring potential epistemic status cripples an ontology for use in clinical systems • Because different behaviours are required depending on the potential epistemic status
Prohibition of entities that that have no instances:Science is driven by hypotheses & medicine by differential diagnosesAn “ontology” for an information system must represent the entities in it: Whether they exist in the world is irrelevant • The information system may contain a representation of a nonexistent entity so that: • Test for its existence • Describe it if it is suddenly found to exist– • Hold data “about it” when it was thought to exist • … • Examples • The toxin responsible for AIDS • The gene for X • Pneumonia caused by Trypanosome • The Higgs Boson • (even Unicorns – e.g. to say that “Narwhale may have been the origin of the myth of the unicorn” – or just to have a catalogue of mythical creatures
“Metadata”: Many ontologies exist primarily to carry metadataNot mere “annotations”; Not just “not first order”: • Metadata about the artifact, e.g. • Mappings - to other ontologies, terminologies, coding systems, UMLS, standards, Web resources, … • Textual definitions, IDs, • … • Editorial Information, e.g. • Authorship, provenance, authority, ... • Meta-models / Schemas: • The structure of the artifact itself to aid in authoring, editing, QA, Interfaces, … • Why don’t ontologies have schemas? • Higher order domain information, (NOT metadata) e.g. • “Endangered species”, “Category first described by”, … • “Two different types of injury”
Ontological issues needing more attention“Prototypes”: • Relation of prototypes, individuals fulfilling those prototypes, and collections of those individuals • “Prototypes” authored pre-hoc and then realised in individual • Blueprints and buildings • Protocols for a trial, individual patient’s histories in the trial, data from the trial, analysis and description of the data, description of the trial • “Prototypes” abstracted post-hoc from observations of individuals • Normative anatomy and biology • Almost all scientific “laws” • Does the distinction matter? In theory? In practice? If so how? • What does it mean to be “conformant” or “normal”? “abnormal”? “missing”?
Ontologies and Software EngineeringTowards multi layer models and defined interfaces • “Ontologies” are not data models • Although data models may be motivated by “ontologies” • Our understanding of the world vs how to store information about the world • A data structure can have a missing entry for “heart beat” • A (live) person cannot have a missing heart beat • Ontology languages are really general logic languages • Can be used to describe either,BUT NOT AT THE SAME TIME