590 likes | 864 Views
Introduction to Clinical Terminology and Classification. AL Rector Open GALEN CO-ODE The Medical Informatics Group, U of Manchester www.cs.man.ac.uk/mig/galen www.opengalen.org www.co-ode.org oiled.man.ac.uk rector@cs.man.ac.uk. Clinical Terminology. Data Entry. Clinical Record.
E N D
Introduction to Clinical Terminology and Classification AL Rector OpenGALENCO-ODEThe Medical Informatics Group, U of Manchester www.cs.man.ac.uk/mig/galenwww.opengalen.orgwww.co-ode.orgoiled.man.ac.ukrector@cs.man.ac.uk
Clinical Terminology Data Entry Clinical Record Decision Support Data Entry Decision Support &Aggregated Data GALENClinicalTerminology Electronic Health Records Where we come from Best Practice Best Practice
OpenGALEN: Philosophy Terminology is software Terminology is the interface between people and machines Re-use is the key Patient-centred information Terminology must have a purpose Always ask: “What’s it for?” Not art for art’s sake Terminology supports clinical applications - not vice versa Applications for someone to do something for somebody Keep the ‘Horse before the Cart’ Always ask: “How will we know if it works?” “How will we know if it fails?”
OpenGALEN: Key ideas • Separation of kinds of knowledge • Terminology, medical record and information system schemas • Models of meaning; Models of Use • Concepts, language, Coding, Indexing, Pragmatics • Machine level, User level • Knowledge is fractal! • There will always be more detail to be added • Therefore terminologies must be extensible • Formal logical Support • Too big and complicated to maintain by hand • Extensibility requires rules • Software needs logical rigour
Axes for kinds of Knowledge • Terminology • Medical Records/Information systems • Decision Support rules • Concepts • Language • Coding • Indexing • Pragmatics & User Interface • Machine level • Human Level
Patient Specific Records Information Model(Patient Data Model) interface interface Abstraction Inference Model(Guideline Model) Concept Model(Ontology) interface Dynamic Guideline Knowledge Static Domain Knowledge 9) Interface of EHR, Messaging & Decision Support Significant Research Topic Now
Uses of Terminology • Clinical • Epidemiology and quality assurance • Reproducibility / Comparability • Indexing • Software • Re-use ! • Integration and Messaging between systems • Authoring and configuring systems • Data capture and presentation (user interface) • Indexing information and knowledge (meta-data, The Web)
An Old Problem “On those remote pages it is written that animals are divided into: a. those that belong to the Emperor b. embalmed ones c. those that are trained d. suckling pigs e. mermaids f. fabulous ones g. stray dogs h. those that are included in this classification i. those that tremble as if they were mad j. innumerable ones k. those drawn with a very fine camel's hair brush l. others m. those that have just broken a flower vase n. those that resemble flies from a distance" From The CelestialEmporium of Benevolent Knowledge, Borges
History:Origins of existing terminologies • Epidemiology • ICD - Farr in 1860s to ICD9 in 1979 • International reporting of morbidity/mortality • ICPC - 1980s • Clinically validated epidemiology in primary care • Now expanded for use in Dutch GP software • Librarianship • MeSH - NLM from around 1900 - Index Medicus & Medline • EMTree - from Elsevier in 1950s - EMBase • Remumeration • ICD9-CM (Clinical Modification) 1980 • 10 x larger than ICD; aimed at US insurance reimbursement
Traditional Systems • Built by people for interpretation by people (Coding clerks) • Most knowledge implicit in rubrics • Must understand medicine to use intelligently • Not built for software • On paper for use on paper • Enumerated - top down all possibilities listed • Serial - Single use - Single View • Hierarchical Thesauri • Traditional terminological techniques from librarianship • ‘Broader than’ / ‘Narrower than’ (ISO 1087) • no logical foundation • Focused on ‘terms’ • Language and concepts mixed • Synonyms, preferred terms, etc caused confusion
History (2) • Pathology indexing • SNOMED 1970s to 1990 (SNOMED International) • First faceted or combinatorial system • Topology, morphology, aetiology, function • Plus diseases cross referenced to ICD9 • Specialty Systems • Mostly similar hierarchical systems • ACRNEMA/SDM - Radiology • NANDA, ICNP… - Nursing • …
History (3) • Early computer systems • Read I (4 digit Read) • Aimed at saving space on early computers • 1-5 Mbyte / 10,000 patients • Hierarchical modelled on ICD9 • Detailed signs and symptoms for primary care • Purchased by UK government in 1990 • Single use • Morbidity indexing • Medical Entities Dictionary (MED) • Jim Cimino
History (4) • Aspirations for electronic patient records (EPRs) • Weed’s Problem Oriented Medical Record • Direct entry by health care professionals • Aspirations for decision support • Ted Shortliffe (MYCIN), Clem McDonald (Computer based reminders), Perry Miller (Critiquing),.. • Aspirations for re-use • Patient centred information • Needed common multi-use multi-purpose terminology • None worked
Motivations and Business Models • Remuneration • ICD9/10-CM in US for insurance and medicare for diseases • Clinical Procedures Terminology (CPT) for surgical procedures • Public Health Reporting • ICD9/10 • Clinical Recording • Read 1-3, SNOMED-RT/CT • ICPC – International Classification of Diseases in Primary Care • Indexing publications • MeSH – Medical Subject Headings - Basis of indexing MedLine/PubMed • EMTree – basis of indexing EMBASE • Support for applications and decision support • GALEN
Summary of Changes at end of 1st Generation • From terminologies for people to terminologies for machines • From paper to software • From single use to multiple re-use for patient centred systems • From entry by coding clerks to direct entry by health care professionals • From pre-defined reporting for statistics to reliable indexing for decision support
Changes at end of first generation • From models of USE to models of MEANING • But tended to lose the model of use • The goal of “useful and usable systems” lost
Problems with‘First Generation’Enumerated Systems in coping with these changes
Problems (1) • Scaling !!! • More detail and more specialities required scaling up, but... • The combinatorial explosion • Example: Burns: • 100 sites x 3 depths 404 codes • 5 subsites/site x chemical or thermal 7272 • x 3 extents x 3 durations 116,352 • ‘The Persian chessboard’ • 264 1019 • 1019grains of rice 100 billion tonnes of rice • 1019 nanoseconds 10,000 years • Read II grew from 20,000 to 250,000 terms in ~100 staff-years • still too small to be useful • but too big to use
Benefits • Avoid the “Exploding Bicycle”From “phrase book” to “dictionary + grammar”Tame combinatorial explosions • 1980 - ICD-9 (E826) 8 • 1990 - READ-2 (T30..) 81 • 1995 - READ-3 87 • 1996 - ICD-10 (V10-19) 587 • V31.22 Occupant of three-wheeled motor vehicle injured in collision with pedal cycle, person on outside of vehicle, nontraffic accident, while working for income • and meanwhile elsewhere in ICD-10 • W65.40 Drowning and submersion while in bath-tub, street and highway, while engaged in sports activity • X35.44 Victim of volcanic eruption, street and highway, while resting, sleeping, eating or engaging in other vital activities
Problems (2) • Information implicit in the rubrics • “Hypertension excluding pregancy” • Computers can’t read! • Invisible to software • No explicit information except the hierarchy • Minimal support for software • No opportunity to use softwre to help • Language and concepts confused • Synonyms • Preferred terms • Homonyms • Only simple look up and spelling correction
Problems (3) • Mixed Organisation • ‘Heart diseases’ in 13 of 19 chapters of ICD • Tumours, infections, congenital abnormalities, toxic, … • ‘Steroids’ in five chapters of standard drug classifications • Anti-inflammatories, anthi-asthmatics, … • Unreliable for indexing or Abstractions • How to say something about ‘all heart diseases’? • Fixed organisation • Single hierarchy - Single use • Where to put ‘gout’ - arthritis or metabolic disease? • Back and forth in each edition of ICD • No re-use
A Mixed Hierarchy A correct kind-of (subsumption) hierarchy Problems 3bThesauri rather than Classifications
Problems (4) • ‘Semantic identifiers’ • Codes really paths - moving a concept meant changing its code • 3 Cardiovascular disorders …3.4 Disorders of Artery... ...3.4.2 Disorders of coronary artery... …3.4.2.3 Coronary thrombosis … • Easy to process but... • Reorganisation requires changing codes • Codes cannot be permanent
Problems (5) • Maintenance • 20 Years from ICD9 to ICD10 • ~100 person-years from Read 1 to Read 3 • Mega francs/guilders/crowns/marks on European coding schemes • Thousands of unpaid hours of committee time • Impossible / meaningless decisions take longest • You can search forever for something that is not there • Multiple uses compete - • Must choose one use • Most successful were clear about their purpose - ICD, ICPC, MeSH • Codes change meaning with version changes • Old data misleading!
Problems (6) • Version specific artefacts • “Not otherwise specified” (NOS) • Used to move a general concept ‘down’ • Not elsewhere classified (NEC) • Catch all - Nowhere else in coding system e.g. ‘Tumour not elsewhere classified’ • dependent on version, • “Other” • Catch all - Not listed below, e.g. “Other diseases of the cardiovascular system” • dependent on version • Not used consistsently
Language/Concepts are slippery • Human cognition makes it look easy • Logic fails to capture it • Classification is easy until you try to do it • Trying since Aristotle in the West and Ancient Chinese in the East • Words/Concepts mean what a community decides they mean • Does a chimpanzee have four hands? • Is a prion alive? • Is surgery on the ovary a kind of ‘Endocrine surgery’? • Easier to agree on the concrete than the abstract • Easy to agree on useful abstractions and generalisations • Harder to agree on how to name them
Problems (8) • There is no re-use - there is no standard • The ‘grand challenge’: A common controlled vocabulary for medicine • But re-use requires multiple different views • People’s needs differ / People do and find different things • By profession • Doctors and specialties, nurses, physiotherapiests, dentists… • By situation • Inpatient, outpatient, primary care, community… • By task • Diagnosis, management, prescribing, • patient care, public health, quality assurance, management, planning • By country and community • US, UK, France, Germany, Japan, Korea, ...
Summary of Problems1st Generation Enumerated Systems • Enumerated Single Hierarchies • List all possibilities in advance • Cannot cope with fractal knowledge • Most knowledge implicit • Invisible to software • Can’t agree on common concepts and classification • Unreliable for indexing • Difficult to use for healthcare professionals • No support for user interface • Can’t build and maintain big classifications • Language and concepts don’t translate easily to logic and software
Cimino’s Desiderata (1) • Concept orientation • Separate language (terms) and concepts (codes) • Concept permanence • Never re-use a code (‘retire’ it) • Nonsemantic concept identifiers • Separate the code from the path • Polyhierarchy • Allow one concept to be classified in multiple ways • Gout can be both a metabolic disease and an arthritis
Cimino’s Desiderata (2) • Formal Definitions • i.e ‘Be compositional’ • Reject ‘Not elsewhere classified’ • concept permanence and NEC • Multiple granularities • Organ, tissue, cellular, molecular • Grades, types, classes of diseases • Special clinical criteria • Multiple consistsent views • Allow different organisations • e.g. functional, anatomical, pathological
Cimino’s Desiderata (3) • Represent context • Family history, risk, source of information • Evolve gracefully • Allow controlled changes • Recognise redundancy (equivalence) • ‘Carcinoma’ + ‘Lung’ ?=? ‘Carcinoma of the lung’ • How would we know? • How could a machine know?
Solution 0: You are worrying about the wrong problem • International Classification of Primary Care (ICPC) • Focus on repeatability and quality across languages for a small (<2000) number of codes
Coding & Classification Decision support MeSH MEGA-TERM UMLS Medical Records SNOMED Axes ICPC ACRNEMA READ OPCS Data entry ICD-9 ICD-10 Solution Generation 1Megaterm + Crossmapping = UMLS Decision support Clinical Applications Medical Records Data entry
The UMLS Knowledge Sources • Metathesaurus • Cross mappings • Language resources • NORM – stemming and term recognition • UMLS Semantic Net • 170 types attached to categorise concepts • Disease, anatomical part, micro-organism, etc.
Solution 1 Cross-mapping & UMLS • Unified Medical Language System (UMLS) from US National Library of Medicine • Defacto common registry for vocabularies • Concept Unique Identifiers (CUIs) and Lexical Unique Identifiers (LUIs) are defacto the common nomenclature • NB must use a CUI + LUI to get unique identification • Licence terms • Class I – free for use • Clsass III – heavily restricted • (Class II – almost nonexistent)
Solution 1 Cross-mapping & UMLS • An invaluable resource, but... • No better than the vocabularies which are mapped • Limited detail for patient care • Unreliable for indexing or abstraction of knowledge • Best for relating everything to MeSH for indexing literature • Still limited by combinatorial explosion • Still can’t cope with fractal knowledge • Not extensible - no help in building or extending terminologiese • No help in reorganising existing terminologies to re-use for new purposes • Top down • Information still implicit • Minimal help with software • No help with data capture, user interfaces
Solution IIa: Build what you need as you need it • LOINC – dominant coding system for laboratory systems(“Logical Observation Identifiers Names and Codes”)http://www.loinc.org/ • Clinical LOINC contains increasing amounts of clinical references • Fully Class I included in UMLS • Closely linked to HL7 and HL7 vocabulary committee
Build and Control what you need only • HL7 Messaging standard • Controls the codes that hold messages together • Uses codes from elsewhere as ‘payload’ • See www.hl7.org • (Possib ly the world’s worst web site) • Some material members only
Solutions Generations 2-3Compositional Systems • Beat the combinatorial explosion • Build concepts out of pieces - leggo • Dictionary and grammar rather than phrasebook • But hard
Solution Generation 1.5: Faceted • Faceted systems: SNOMED International • Inflammation + Lung + Infection + Pneumococcus Pneumoccal pneumonia • Limit combinatorial explosion, but… • Rigid - a limited number of axes / facets / chapters • Each facet has the problems of a first generation enumerated system • Much knowledge still implicit • No way to know how identifiers relate • No explicit relations, only ‘+’ • No way to recognise redundancy / equivalence • No help with data capture or user interface / No way to recognise nonsense • Carcinoma + Hair + Donkey + Emotional ???? • Still can’t cope with fractal knowledge • Limited extensibility: limited help with building, extending or reorganising • Still Top Down
Generation 2: Enumerated Compositional • Read III with qualifiers • Inflammation: site: lung, cause: pneumococcus Pnemococcal Pneumonia • More semantics but… • Limited qualifiers - limited views - limited re-use • Limited help with data capture - User interface difficult • Much information still implicit - limited software support • No way to recognise redundancy / equivalence / errors • Organisation still mixed - indexing better but still unreliable • Limited separation of language and concepts • Still can’t cope with fractal knowledge • Limited extensibility; limited help with building and reorganising terminologies • Top down
Feature Structure Thing + feature: pathological red pathological Heart MitralValve MitralValve * ALWAYS partOf: Heart Encrustation * ALWAYS feature: pathological Encrustation Structure + feature: pathological + involves: Heart Encrustation + involves: MitralValve Logic Based Ontologies: The basics Primitive skeleton Descriptions Definitions Reasoning Validating Thing red + partOf: Heart red + partOf: Heart + (feature: pathological)
CT Vocabulary • “Reference Terminology” vs “Interface Terminologies” • Reference terminology = enumerated hierarchy of formally defined terms • Interface terminology = navigation structure for user interface • Explicitly excluded from SNOMED-RT • “Terming”, “Coding”, and “Grouping” • Terming - finding the lexical string • Coding - finding the correct unique code (concept) • Grouping - putting codes into groupers for epidmiological or other purposes
Generation 2.5 Pre-coordinatedFormal Compositions • SNOMED-CT • Formal collaboration between College of American Pathologists (CAP/SNOMED) and NHS • Formal logical model for classifying a fixed list of definitions • Simple fixed ontology (7 links) • Now officially adopted and probably available for both NHS and related academic uses • GALEN derived terminologies • UK Drug Ontology • Procedure classifications
Generation III • Fully compositional post coordinated • Not yet in use or fully available • GALEN-like • Will probably arrive with Semantic Web
Other Key Resources • Anatomy • Digital Anatomist Foundational Model of Anatomy • University of Washington (http://sig.biostr.washington.edu/projects/da/) • Comprehensive model of STRUCTURAL anatomy • Transformed into formal representation in Freiburg • Feasibility rather than production • Mouse • The Edinburgh Mouse Atlas Project (http://genex.hgu.mrc.ac.uk/) • Bioinformatics • GO - The Gene Ontology • MGED – Mircroarray Gene Expression Data • OMIM – Online Mendelian Inheritance in Man • Drugs • Proprietary databases – First Databank, Micromed • UK Drug Dictionary (UKCPRS) • National Cancer Institute CaCore Ontologies