530 likes | 687 Views
Making Terminologies useful and usable: Clinical Terminologies in the 21 st Century: What are they for? What might they look like?. Alan Rector Bio and Health Informatics Forum/ Medical Informatics Group Department of Computer Science University of Manchester
E N D
Making Terminologies useful and usable:Clinical Terminologies in the 21st Century: What are they for? What might they look like? Alan RectorBio and Health Informatics Forum/Medical Informatics GroupDepartment of Computer ScienceUniversity of Manchester rector@cs.man.ac.ukwww.cs.man.ac.uk/mig img.man.ac.ukwww.clinical-escience.orgmygrid.man.ac.uk
An Old Problem “On those remote pages it is written that animals are divided into: a. those that belong to the Emperor b. embalmed ones c. those that are trained d. suckling pigs e. mermaids f. fabulous ones g. stray dogs h. those that are included in this classification i. those that tremble as if they were mad j. innumerable ones k. those drawn with a very fine camel's hair brush l. others m. those that have just broken a flower vase n. those that resemble flies from a distance" From The CelestialEmporium of Benevolent Knowledge, Borges
But why in healthcare? • What’s it for? What’s the purpose? • Terminologies are of little use in themselves • How will it make care better? new things possible? • How will it make information systems better? • Painful experience of 20 years of over-selling and under performance • Do we need it: Clinically? Technically? • If we need it • what is ‘it’? Is ‘it’ one thing or many? • How will we know if we have ‘it’? • How will we know if ‘it’ is fit for purpose
Why Now? • What’s different now? • Web, E-Science, Grids • Web speed • New technologies – OWL, new DLs, hybrid frame-DL environments www.semanticweb.org • Post genomic medicine – personalised medicine • Joining up Healthcare Medical and Bioscience research – CLEF • Systemisation of healthcare • Clinical error reduction, clinical governance, evidence based medicine, … • Does anybody else have similar problems? • Ontologies are ‘flavour of the month’ in E-Science & Web • Bioinformatics is building them very rapidly • What can we learn from them?
Need more and better clinical information • Which scales • In Size • In Complexity A Convergence of Need • Post genomic research • Safe, high quality, evidence based health care Knowledge is Fractal
The requirements & Tools chain • Clinical users with needs to improve care / clinical knowledge • Applications for clinical users that meet those needs • Developers’ needs for terminology to build those applications • Terminologies which fit the applications’ builders’ needs to meet the clinical users’ needs
Who is it for?(Useful & usable to whom?) • Clinical users • Carers - prospective • Reviewers – retrospective • Researchers, managers, assessors, … • The community – how it shares its knowledge • Knowledge creators / distributors • Application developers • Easier to re-use what exists than to build new • Re-use or bust • Terminology authors • Quick responsive evolution
Useful and Usable • Useful – for what? • Supports needed applications • Purpose • Does it well • Quality • Usable – by whom? • Intuitive / understandable • Handy • What you need is “to hand” • Timely
Preview of Arguments • The priorities are clinical needs supported by applications supported by terminology • Clinical quality is critical • Useful and usable to: clinical users, developers, ‘reviewers’, authors • In an open evolving world, open managed evolution is the only plausible way forward • Current technology gives us the opportunity to cope • Tools and environments are as important as content
Clinical Terminology Data EntryPEN&PAD Clinical Record Decision Support Decision Support &Aggregated Data GALENClinicalTerminology Electronic Health Records:CLEF Data EntryLanguageTechnology:CLEF HealthCard Mr Ivor Bigun Dun Roamin Anytown Any country 4431 3654 90273 Where we come from Best Practice Best Practice
Terminology is Now Middlewarehuman-machine / machine - machine • Explicit • Machines can only manipulate what is represented explicitly • More re-use more manipulation more explicitness • Understandable • People can only build, maintain and use it if they can understand it • Adequate • Expressive enough to do the job but still computationally tractable • Reliable • People can use it consistently • Scalable and maintainable
Where we think we are going: • Pre-1980: paper • Application specific retrospective human oriented systems • ICD, early SNOMED, CPT, OPCS, … • Mid 1980s – 1990s: “electronic paper” • Retrospective reporting + Prospective collection ICPC Read I, II • Mid 1990s – mid 2000s:Centralised computer based • Retrospective reporting + Prospective collection OpenGALEN, Read III, SNOMED-RT… PEN&PAD • Mid 2000s – ?: Web based open managed evolution • ???? – but see the Semantic Web, Gene Ontology, etc.
How we will know when we get thereCriteria for success • Re-use • A recognised growing library of common decsision support modules • Stop starting from scratch! • Integration • 2+ independently developed DSSs integrated with2+ independently developed EPRS withoutexponentially increasing effort.
Criteria for success • Authoring • No individual invests in their own terminology • enterprise-wide terminology servers • Indexing • Simplification of systems • a sharp drop in special cases and exceptions • a sharp increase in authors’ productivity
Criteria for success • User interfaces • Real systems in real use with real patients by real clinicians • transparent systems
Stones in the Road • Why are we not there yet? • Some background definitions • Some hypotheses
Clinical quality & logical quality • Clinical quality – do users put in the right things? • Repeatability of information captue (inter rater reliability) • For decision support in prospective use • For retrieval in retrospective use • Salience • Relevance to clinical decisions for prospective use • Significance to questions for retrospective use • A better measue than “coverage” • Logical quality – do systems give the right responses? • Correct organisation (classification) • Correct inferences given correct input
Hypothesis 1 • Most computer oriented terminology development ignores clinical quality … • The EHR as black hole • Bigger is not necessarily better …although clinical quality was the primary concern of traditional paper/human oriented terminologies(and there are honourable exceptions – e,g, ICPC). • Evidence: High variability in recorded use Systematic failure to use data from GP systems in clinical studies (despite PRIMIS) Our own & colleagues’ experience in repeated studies Current planned cost of cohort ‘post genomic’ studies
Three models • Meaning - ontologies • Can I depend on the answers? • “Dyspnoea is a respiratory problem” • Clinical significance – decision support • What should I think of / how does it affect decisions • “Dyspnoea can be a symptom of congestive heart failure” • Model of use – EHR/human factors • Is what I want ‘to hand’ – is it ‘handy’?” • “Dyspnoea should be a question on a cardiac history”
Hypothesis 2 • Early terminologies emphasised models of use and significance and failed for lack of model of meaning • “Heart diseases” are in 13 Chapters of ICD9 • Recent terminologies emphasise model of meaning and fail for lack of models of use and significance • Evidence: • User dissatisfaction, non-use, and poor quality data • The few systems based on models of use have been surprisingly popular with doctors, e.g. MedCin, ORCA • But hard to use for retrieval • We have fewer formal models of use than of meaning • We have almost no models of ‘significance’
Grounding cost vs Clean-up cost(with thanks to Enrico Coiera) • “Grounding cost” • The cost of establishing a given quality of communication • How much French do you need to order a meal? • “Clean up cost” • The cost of fixing miscommunication • How many surprises will you accept? of what kind?
Special purpose vs Re-usable Multipurpose • Special purpose terminologies • Almost all retrospective • Reporting for remuneration – ICD9-CM, CPT • Reporting for epidemiology - ICD10, OPCS • Multipurpose re-usable terminologies • Aspire to be the glue for ‘Patient centred systems’ & ‘Personalised Medicine’ • Decision support • Electronic Health Records • Research • Integration with Bioscience • … • But too often ‘multipurpose’ means ‘no purpose’ ‘multiapplication’ means ‘no application’
Need “Multipurpose” mean “no purpose”? • Multiple purposes held by multiple groups • Multiple sources of expertise & authority • One size does not fit all • Multiple collaborations • Multiple legacies • Multiple purposes use multiple applications • Applications are the point of interaction • Applications make needs concrete & testable
Multipurpose means interacting with othersIt’s a big open world out there… • Bioscience • Gene Ontology, National Cancer Institute Center for Bioinformatics (NCICB), The Digital Anatomist/ Mouse Anatomy/Mammalian Anatomy, BioJava,PRINTS, EMBL, Microarrays, Protemoics, Metabalomics, Systems Biology… • Medicine meets bioscience • Cancer therapeutics, New imaging, … • E-Health: sharing and pooling data: Collections based research” • BioBank, NTRAC, NCRI, NCTR, CLEF, … • “Health Intelligence” • MRC policy on data sharing • …
Hypothesis 3 • Grounding costs can be delimited for special purpose terminologies • Grounding costs are indefinite for re-usable terminologies (& is historically high) • Without purposes testable through applications there • Danger of the escalating deadly embrace • “Must have terminology to build applications; but Must have applications before terminology” • Evolutionary approach the only exit
Central Control vs Open managed evolution • Académie française vs Oxford English Dictionary • Scholasticism vs Empiricism • The ‘arrogance of the a prior’People don’t know what they do • Look to see what is actually used • Language technology shows time and again that our predictions are faulty • Command economy vs Social Market • Participation is the issue rather than money • Somebody will still have to pay • But at least they might pay for something useful
Central management • Owned by one “Authority” • Coupling tight / autonomy low/ participation low • “Grounding costs” high / “Clean up costs” low? • must have everything before you can do anything • Change slow & lockstep • A product
Open managed evolution • “Owned” by the community – multiple “authorities” • Coupling loose/ autonomy high / participation high • To be useful & usable involve users using systems • “Grounding costs” low / “Clean up costs” high? • “Just in time” “Just enough” • Agree where it counts • Change quick and local - “threaded with annealing” • A process
Hypotheses 4 • Single purpose clinical terminologies can be best managed centrally • By definition are developed in conjunction with an application • Re-usable terminologies can only succeed by open managed evolution • Many purposes require many contributors • Evidence: Speed of uptake of HL7/LOINC W3C & the evolution of the Web • Re-usable terminologies can only be developed in open collaboration with applications • Otherwise “multipurpose” become “no purpose”
Hypothesis 5 • Modern technology provides the means to support open managed evolution without compromising clinical quality or technical stability • Trade lower grounding cost for greater clean up cost • Focus on minimal stable core. Defer commitments. • Evidence: OpenGALEN, Gene Ontology • Utilise Web/Grid technologies for rapid dissemination and coordination • Evidence: Current developments at Mayo clinic using LDAP • Distribute terminology like domain names
The technologies • Applications centric development Decoupled development • Special purpose languages / “Intermediate Representations” • Deferred commitment • Clinical before technical • Logic based ontologies + • Models of clinical significance • Models of clinical use • Models of EHRs • Web services & Grid technology • Authentication/authorisation/accounting • Distributed directories & LDAP • Service discovery
Decoupled development using “Conceptual Lego” • If we manage the connectors and the pieces the users can build most things for themselves • Without compromising quality
Common Terminology/Ontology templates/views Meta-authoring templates/views Meta-authoring authoring environments Intermediate Representations clinical applications clinicians / Applications buildersEmpowered Authors Applications centric Development
WorldwideResources problems Local authoruses resources & templates to formulate definition templates Central Gurusintegrate & fix problems Local Authorneeds new terms for application Server validates &organises Local authorchecks LocalOntology CentralOntology updates Loosely Coupled Development
The templates are more important than the underlying formalism… "Open fixation of a fracture of the neck of the left femur" MAIN fixing ACTS_ON fracture HAS_LOCATION neck of long bone IS_PART_OF femur HAS_LATERALITY left HAS_APPROACH open “Intermediate Representations” are critical
…complex underpinnings can &will change (‘SurgicalProcess’ which isMainlyCharacterisedBy (performance which isEnactmentOf (‘SurgicalFixing’ which hasSpecificSubprocess (‘SurgicalAccessing’ hasSurgicalOpenClosedness (SurgicalOpenClosedness which hasAbsoluteState surgicallyOpen)) actsSpecificallyOn (PathologicalBodyStructure which < involves Bone hasUniqueAssociatedProcess FracturingProcess hasSpecificLocation (Collumwhich isSpecificSolidDivisionOf (Femurwhich hasLeftRightSelector leftSelection))>))))
Decoupling & Flexibility • Use formality to permit flexibility • Change need not mean instability • Formality means effects can be predited • Most users only need change in tightly controlled areas • Lesson from the Semantic Web:“Forking” a natural part of development • Harmless if strictly local • Manageable if controlled from standard “Lego” & templates • “Clean up cost” • 10%-20% central effort is a reasonable target • Necessary to cope with change and ignorance • Evolution by “annealing”
Structured Data Entry File Edit Help More... Radius Tibia Fibula Wrist Ulna Femur Ankle Closed Right More... Shaft Neck Open Left More... Open Left Femur Fixation Humerus Reduction Fixation Gt Troch Neck Scalable models of Use: PEN&PAD FRACTURE SURGERY 250,000 forms from 10,000 Facts“Fractal tailoring”
Idiopathic Hypertensionin Study a phase 2 Idiopathic Hypertensionin our co’s phase 2 study a Scalable models of use:Fractal tailoring forms for clinical trials Hypertension Hypertension Idiopathic Hypertension Idiopathic Hypertension` In our company’s studies In our company’s studies In Phase 2 studies In Phase 2 studies
It can work • The Lessons of GALEN • Loosely coupled development based on formal ontologies works • “Coherence without uniformity” • 90% of work done locally • Ontologies can be modular rather than monolithic • “Plug and play” terminology development • The Lessons of PEN&PAD • Models of use based on formal ontologies scale • 250,000+ forms from 10,000 ‘facts’ • The Lessons of the Semantic Web • It works for knowledge management • Growing user community outside of medicine • No longer “rocket science”
Logic-based Ontologies: Conceptual Lego “SNPolymorphism of CFTRGene causing Defect in MembraneTransport of ChlorideIon causing Increase in Viscosity of Mucus in CysticFibrosis…” “Hand which isanatomicallynormal”
Logic based ontologies • A formalisation of semantic nets, frame systems, and object hierarchies via KL-ONE and KRL • “is-kind-of” = “implies” (“logical subsumption”) • “Dog is a kind of wolf” means“All dogs are wolves” • Modern examples: DAML+OIL /“OWL”?) • Older variants LOOM, CLASSIC, BACK, GRAIL, K-REP, …
Feature Structure Thing + feature: pathological red pathological Heart MitralValve MitralValve * ALWAYS partOf: Heart Encrustation * ALWAYS feature: pathological Encrustation Structure + feature: pathological + involves: Heart Encrustation + involves: MitralValve Logic Based Ontologies: The basics Validating (constraining cross products) Primitives Descriptions Definitions Reasoning Thing red + partOf: Heart red + partOf: Heart + (feature: pathological)
Protein CFTRGene in humans Membrane transport mediated by (Protein coded by (CFTRgene in humans)) Protein coded by(CFTRgene & in humans) Disease caused by (abnormality in (Membrane transport mediated by (Protein coded by (CTFR gene & in humans)))) Building with Conceptual Lego Species Genes Function Disease
Avoiding combinatorial explosions • The “Exploding Bicycle”From “phrase book” to “dictionary + grammar” • 1980 - ICD-9 (E826) 8 • 1990 - READ-2 (T30..) 81 • 1995 - READ-3 87 • 1996 - ICD-10 (V10-19 Australian) 587 • V31.22 Occupant of three-wheeled motor vehicle injured in collision with pedal cycle, person on outside of vehicle, nontraffic accident, while working for income • and meanwhile elsewhere in ICD-10 • W65.40 Drowning and submersion while in bath-tub, street and highway, while engaged in sports activity • X35.44 Victim of volcanic eruption, street and highway, while resting, sleeping, eating or engaging in other vital activities
Structure Function Structure Part-whole Part-whole Function The Cost: Normalising (untangling) Ontologies
… ActionRole PhysiologicRole HormoneRole CatalystRole … … Substance BodySubstance Protein Insulin Steroid … The Cost: Normalising (untangling) OntologiesMaking each meaning explicit and separate PhysSubstance Protein ProteinHormone Insulin Enzyme Steroid SteroidHormone Hormone ProteinHormone^ Insulin^ SteroidHormone^ Catalyst Enzyme^ PhysSubstance Protein‘ ProteinHormone’ Insulin‘Enzyme’ Steroid‘SteroidHormone’ ‘Hormone’ ‘ProteinHormone’ Insulin^‘SteroidHormone’ ‘Catalyst’‘Enzyme’ …build it all by combining simple trees Hormone = Substance & playsRole-HormoneRole ProteinHormone = Protein & playsRole-HormoneRoleSteroidHormone = Steroid & playsRole-HormoneRole Catalyst = Substance & playsRole CatalystRole Insulin playsRole HormoneRole Enzyme ?=? Protein & playsRole-CatalystRole
But none of it works without toolsNone of it works without communication & cooperation