500 likes | 622 Views
Requirements for natural language understanding in referent-tracking based electronic patient records. CS seminar, Bolzano, Dec 5, 2005. Dr. W. Ceusters European Centre for Ontological Research Saarland University, Saarbrücken - Germany. Presentation overview. ECOR and me
E N D
Requirements for natural language understanding in referent-tracking based electronic patient records.CS seminar, Bolzano,Dec 5, 2005 Dr. W. Ceusters European Centre for Ontological Research Saarland University, Saarbrücken - Germany
Presentation overview • ECOR and me • The Electronic Health Record (EHR) • Problems with terminologies and their use in the EHR • Realist ontology • Referent Tracking • Opportunities for narural language understanding
1977 1959 - 2005 2004 1989 2002 1992 1998 Short personal history
Electronic Health Record • ISO/TS 18308:2003 • Electronic Health Record (EHR): • A repository of information regarding the health of a subject of care, in computerprocessable form. • EHR system: • the set of components that form the mechanism by which electronic health records arecreated, used, stored, and retrieved. It includes people, data, rules and procedures,processing and storage devices, and communication and support facilities. • More common meaning of EHR system: • only the “software being executed”
A replacement for This and that
The Medical Informatics dogma To structure or NOT to be • Fact: computers can only deal with a structured representation of reality: • structured data: • relational databases, spread sheets • structured information: • XML simulates context • structured knowledge: • rule-based knowledge systems • Conclusion: a need for structured data entry (???)
Example of data entry form www.comchart.com
Structured EHR data entry • Current technical solutions: • Data entry forms • provide the structure • various paradigms: • Rigid, pre-fixed • Adaptable to user-preferences, but fixed when used • Dynamically adapting to entered data in context • Terminologies, coding and classification systems: • provide the language to be used • Exchange of information preserving meaning • Statistics and epidemiology
The International Classification of diseases (WHO). • ... • Chapter II: Neoplasms (C00-D48) • Chapter III: Diseases of the Blood and Blood-forming organs and certain disorders involving the immune mechanism (D50-D89) • Excludes : auto-immune disease (systemic) NOS (M35.9) • .... • Nutritional Anemias (D50-D53) • D50 Iron deficiency anaemia • Includes: ... • D50.0 Iron deficiency anaemia secondary to blood loss (chronic) • Excludes : ... • D50.1 ... • D51 Vit B12 deficiency anaemia • Haemolytic Anemias (D55-D59) • ... • Chapter IV: ...
Main problems • Internal and external consistency of terminologies. • What do the terms in a terminology stand for ?
Lack of face value Agrammatical constructions Shift in ontological category (or ambiguous meaning) Problems with terminologies (1)
Problems with terminologies (2) ‘ventricle’ used in 2 different meanings
Problems with terminologies (3) • Mixing of differentiae • Ontological nonsense
Problems with terminologies (4) Incomplete classification
wisdom (- representation) knowledge - representation information - representation • Questions not often enough asked: • What part of our data corresponds with something out there in reality ? • What part of reality is not captured by our data, but should because it is relevant ? data - representation Reality What is there on the side of the patient Current mainstream thinking
PtID Date ObsCode Narrative 5572 5572 298 5572 5572 5572 298 2309 47804 5572 5572 12/07/1990 01/04/1997 22/08/1993 22/08/1993 01/04/1997 12/07/1990 21/03/1992 03/04/1993 04/07/1990 17/05/1993 04/07/1990 26442006 2909872 9001224 26442006 9001224 58298795 26442006 9001224 79001 79001 81134009 Essential hypertension closed fracture of shaft of femur Fracture, closed, spiral closed fracture of shaft of femur Accident in public building (supermarket) Other lesion on other specified region closed fracture of shaft of femur Essential hypertension Accident in public building (supermarket) Closed fracture of radial head Accident in public building (supermarket) 5572 04/07/1990 79001 Essential hypertension 0939 24/12/1991 255174002 benign polyp of biliary tract 2309 21/03/1992 26442006 closed fracture of shaft of femur 0939 20/12/1998 255087006 malignant polyp of biliary tract A look at the database: Use of SNOMED codes for ‘unambiguous’ understanding How many numerically different disorders are listed here ? * How many different types of disorders are listed here ? * How many disorders have patients 5572, 2309 and 298 each had thus far in their lifetime ? * * cause, not disorder
PtID Date ObsCode Narrative 2309 298 5572 298 5572 5572 5572 47804 5572 5572 5572 01/04/1997 04/07/1990 04/07/1990 22/08/1993 12/07/1990 22/08/1993 12/07/1990 01/04/1997 21/03/1992 03/04/1993 17/05/1993 79001 26442006 9001224 81134009 26442006 26442006 58298795 2909872 9001224 79001 9001224 Accident in public building (supermarket) Closed fracture of radial head closed fracture of shaft of femur Fracture, closed, spiral closed fracture of shaft of femur closed fracture of shaft of femur Other lesion on other specified region Accident in public building (supermarket) Essential hypertension Accident in public building (supermarket) Essential hypertension Would it be easier if youcould see the code labels ? 5572 04/07/1990 79001 Essential hypertension 0939 24/12/1991 255174002 benign polyp of biliary tract 2309 21/03/1992 26442006 closed fracture of shaft of femur 0939 20/12/1998 255087006 malignant polyp of biliary tract
Different patients. Same supermarket? Maybe the same (irrelevant ?) freezer section ? Or different supermarkets, but always in the freezer sections ? PtID Date ObsCode Narrative Same patient, same hypertension code: Same (numerically identical) hypertension ? 5572 5572 5572 2309 47804 5572 298 298 5572 5572 5572 21/03/1992 12/07/1990 22/08/1993 17/05/1993 01/04/1997 22/08/1993 01/04/1997 04/07/1990 03/04/1993 04/07/1990 12/07/1990 26442006 9001224 58298795 2909872 26442006 26442006 81134009 79001 9001224 79001 9001224 Accident in public building (supermarket) Closed fracture of radial head closed fracture of shaft of femur Essential hypertension Other lesion on other specified region closed fracture of shaft of femur Fracture, closed, spiral Essential hypertension closed fracture of shaft of femur Accident in public building (supermarket) Accident in public building (supermarket) 5572 04/07/1990 79001 Essential hypertension 0939 24/12/1991 255174002 benign polyp of biliary tract Same patient, different dates, same fracture codes: same (numerically identical) fracture ? 2309 21/03/1992 26442006 closed fracture of shaft of femur Same patient, same date, 2 different fracture codes: same (numerically identical) fracture ? Same patient, different dates, Different codes. Same (numerically identical) polyp ? Different patients, same fracture codes: Same (numerically identical) fracture ? 0939 20/12/1998 255087006 malignant polyp of biliary tract A look at the problems ...
Main problem areasfor current EHRs • Statements refer only very implicitly to the concrete entities about which they give information. • Idiosyncracies of concept-based terminologies • tell us only that some instance of the class the codes refer to, is refered to in the statement, but not what instance precisely. • Are usually confused about classes and individuals. • “Country” and “Belgium”. • Mixing up the act of observation and the thing observed. • Mixing up statements and the entities these statements refer to.
Consequences • Very difficult to: • Count the number of (numerically) different diseases • Bad statistics on incidence, prevalence, ... • Bad basis for health cost containment • Relate (numerically same or different) causal factors to disorders: • Dangerous public places (specific work floors, swimming pools), • dogs with rabies, • HIV contaminated blood from donors, • food from unhygienic source, ... • Hampers prevention • ...
Proposed solution:Referent Tracking • Foundation: Realist ontology
Ontology • ‘Ontology’: the study of being as a science • ‘An ontology’ is a representation of some pre-existing domain of realitywhich • (1) reflects the properties of the objects within its domain in such a waythat there obtains a systematic correlation between realityand the representation itself, • (2) is intelligible to a domain expert • (3) is formalized in a way that allows it to support automatic information processing • ‘ontological’ (as adjective): • Within an ontology. • Derived by applying the methodology of ontology • ...
Proposed solution:Referent Tracking • Purpose: • explicitreference to the concrete individual entities relevant to the accurate description of each patient’s condition, therapies, outcomes, ... • Method: • Introduce an Instance Unique Identifier(IUI) for each relevant individual (= particular, = instance). • Distinguish between • IUI assignment: for instances that do exist • IUI reservation: for entities expected to come into existence in the future
Universals EHR system City hospital’s EHR system HC City hospital Freezer section The freezer section of Jane’s favourite supermarket Jane Smith Person Dr. Peters Dr. Longley Femur Jane’s left femur Jane’s left femur Fracture Jane’s left femur fracture Image Jane’s fracture’s image Jane’s falling occurrents t Jane’s femur breaking Dr. Peter’s examination of Jane’s fracture Dr. Peter’s diagnosis making Dr. Peter’s ordering of an X-ray Jane’s fracture’s healing Dr. Longley’s examination of Jane’ s fracture Shooting the pictures of Jane’s leg Freezer section dismantled Jane dies An ontological analysis continuants
Essentials of Referent Tracking • Generation of universally unique identifiers; • deciding what particulars should receive a IUI; • finding out whether or not a particular has already been assigned a IUI (each particular should receive maximally one IUI); • using IUIs in the EHR, i.e. issues concerning the syntax and semantics of statements containing IUIs; • determining the truth values of statements in which IUIs are used; • correcting errors in the assignment of IUIs.
IUI assignment • = an act carried out by the first ‘cognitive agent’ feeling the need to acknowledge the existence of a particular it has information about by labellingit with a UUID. • ‘cognitive agent’: • A person; • An organisation; • A device or software agent, e.g. • Bank note printer, • Image analysis software.
Criteria for IUI assignment (1) • The particular’s existence must be determined: • Easy for persons in front of you, body parts, ... • Easy for ‘planned acts’: they do not exist before the plan is executed ! • Only the plan exists and possibly the statements made about the future execution of the plan • More difficult: subjective symptoms • But the statements the patient makes about them do exist ! • However: • no need to know what the particular exactly is, i.e. which universal it instantiates • No need to be able to point to it precisely • One bee out of a particular swarm that stung the patient, one pain out of a series of pain attacks that made the patient worried • But: this is not a matter of choice, not ‘any’ out of ...
Criteria for IUI assignment (2) • The particular’s existence ‘may not already have been determined as the existence of something else’: • Morning star and evening star • Himalaya • Multiple sclerosis • May not have already been assigned a IUI. • It must be relevant to do so: • Personal decision, (scientific) community guideline, ... • Possibilities offered by the EHR system • If a IUI has been assigned by somebody, everybody else making statements about the particular should use it
Representation in the EHR • Relevant particulars referred to using IUIs • Relationships that obtain between particulars at time t expressed using relations from an ontology (type OBO) • Statements describing for each particular, at time t: • Of what universal from an ontology it is an instance of • AND/OR (if one insists): • By means of what concept from a concept-based system it can sensibly be described
this this • But: • #12 #234 #876 • #234 is_located_in #876 • #876 is_part_of #12 • #876 is_instance_of left_tibia • ... { concepts from a terminology • With Relationships and universals from a realist ontology A shift in mind set • Not: • ‘this patient has a fracture of the left tibia ’
Pragmatics of IUIs in EHRs • IUI assignment requires an additional effort • In principle no difference qua (or just a little bit more) effort compared to using directly codes from concept-based systems • A search for concept-codes is replaced by a search for the appropriate IUI using exactly the same mechanisms • Browsing • Code-finder software • Auto-coding software (CLEF NLP software Andrea Setzer) • With that IUI comes a wealth of already registered information • If for the same patient different IUIs apply, the user must make the decision which one is the one under scrutiny, or whether it is again a new instance • A transfert or reference mechanism makes the statements visible through the RTDB
PtID Date ObsCode Narrative IUI-001 5572 5572 2309 5572 5572 5572 298 5572 298 5572 47804 03/04/1993 01/04/1997 04/07/1990 04/07/1990 12/07/1990 12/07/1990 21/03/1992 01/04/1997 17/05/1993 22/08/1993 22/08/1993 26442006 81134009 26442006 9001224 9001224 79001 58298795 79001 2909872 26442006 9001224 Accident in public building (supermarket) closed fracture of shaft of femur Other lesion on other specified region closed fracture of shaft of femur Essential hypertension Accident in public building (supermarket) Closed fracture of radial head Essential hypertension closed fracture of shaft of femur Fracture, closed, spiral Accident in public building (supermarket) IUI-001 IUI-001 IUI-007 5572 04/07/1990 79001 IUI-005 Essential hypertension 0939 24/12/1991 255174002 IUI-004 benign polyp of biliary tract 2309 21/03/1992 26442006 IUI-002 closed fracture of shaft of femur IUI-007 IUI-005 IUI-007 IUI-012 IUI-005 0939 20/12/1998 255087006 IUI-004 malignant polyp of biliary tract Advantage: betterreality representation IUI-003
Other Advantages • mapping as by-product of tracking • Descriptions about the same particular using different ontologies/concept-based systems • Quality control of ontologies and concept-based systems • Systematic “inconsistent” descriptions in or cross terminologies may indicate poor definition of the respective terms
How to make this practicalfor the text-based partsof an EHR ? Referent tracking in the linguistic sense !
The problem summarised • natural language is the only medium that is able to communicate clinical information about individual patients without loss of necessary detail; • (virtual)structured data repositories are required to make subsequent analyses possible; • any transformation from free language to coding and classification systems results in information loss that is unacceptable for individual patient care, but at the other hand is a conditio sine qua non for population based studies; • today’s graphical user interfaces can deal reasonably well with picking lists build around controlled vocabularies that fulfil a bridging function from free language towards coding and classification systems but are incompatible with referent tracking
Natural Language Understanding Technology The ultimate scenario Ontology continuant disorder person CAG repeat EHR Juvenile HD Referent Tracking Database #IUI-1 ‘affects’ #IUI-2 #IUI-3 ‘affects’ #IUI-2 #IUI-1 ‘causes’ #IUI-3
Jim Cimino’s Woods Hole case First sentence: Jane Smith is a 30 year old, Native American female who presents to the emergency room with the chief complaint of cough and chest pain.
Step 1: identify the phrases referring to particulars Jane Smithis a 50 year old, Native American female who presents to the emergency room with the chief complaint of cough and chest pain.
Jane Smithis a 50 year old, Native American female who presents to the emergency room with the chief complaint of cough and chest pain. Jane Smith Jane Smith’s age Jane Smith’s race Jane Smith’s gender Jane Smith Jane Smith’s showing up at ... A specific emergency room of health facility XYZ A specific pain experienced by JaneSmith Jane Smith’s complaining primarily about ... A temporal part of Jane Smith’s life marked by happenings of coughs Jane Smith’s chest Step 2: indentify to what particulars these phrases refer
Jane Smithis a 50 year old, Native American female who presents to the emergency room with the chief complaint of cough and chest pain. “Jane Smith” CS1-age CS1-female- gender CS2-woman CS1-native-american CS1-emergency room CS1-chief-complaint CS2-chest CS1-coughing CS1-chest-pain CS2-pain Compare with simple clinical coding in juxtaposition
Has-Age “Jane Smith” CS3-50 years old Is-A Has-Sayer CS3-woman Is-A CS3-native american Has-participant Has-Saying CS3-chest pain Has-happening-during Has-Saying CS3-coughing CS3-consultation CS3-Em.Room Has-Loc Compare with the output of the NAIVE !!! semantic analyser we all would dream of Compare with the output of the perfect semantic analyser we all would dream of CS3-complaining
“chest-pain” CS3-complaining Has-Saying Has-’referent’ CS3-chest pain Has-Saying “coughing” CS3-coughing Has-’referent’ What it (more or less) should be with traditional coding
CS3-complaining Has-code Has-Saying J.S.’ complaining at t1 “chest-pain” Has-referent Has-Saying J.S.’ chest pain at t-1 “coughing” Has-code Has-referent CS3-chest pain J.S.’ coughing at t-1 CS3-coughing Has-code What it (more or less) should be with referent tracking
Most important difference: Use of generic terms Use of concrete particulars
Step 3: are relevant and necessary particulars missing ? • Referred to: • Jane Smith • Jane Smith’s age • Jane Smith’s race • Jane Smith’s gender • Jane Smith’s showing up at ... • The specific emergency room in the health facility • Jane Smith’s primarily complaining ... • The temporal part ... coughs • Jane Smith’s chest • Jane Smith’s particular pain • Missing: • The health facility • The healthcare worker she consulted • The particular coughs (under the condition she tells the objectivetruth) • The underlying disorder (under whatever state of affairs)
Step 4: IUI assignment • Assumptions: • the RTS contains already: • IUI-1 Jane Smith Coi = <IUIa, ta, CS3, IUI-1, woman, tr> • IUI-1.1 Ri = <IUIa, ta, depends-on, BFO, {IUI-1.1, IUI-1}, tr> Coi = <IUIa, ta, CS1, IUI-1.1, age, tr> • IUI-1.2 Coi = <IUIa, ta, CS1, IUI-1.2, cherokee, tr> Ri = <IUIa, ta, depends-on, BFO, {IUI-1.2, IUI-1}, tr> • IUI-1.3 Coi = <IUIa, ta, CS3, IUI-1.3, chestpain, tr> Ri = <IUIa, ta, is-located-in, BFO, {IUI-1.3, IUI-1}, tr> • All dates in the statements are 2 years earlier than now • What to do with: • Jane Smith • Jane Smith’s race (CS1: native American) • Jane Smith’s gender (CS1: female) • Jane Smith’s chest pain (CS3: chest pain) • Jane Smith’s age (50)
Conclusion • Referent tracking can solve a number of problems in an elegant way. • Existing (or emerging) technologies can be used for the implementation. • Old technologies (cbs) can play an interesting role. • Big Brother feeling is to be expected but with adequate measures easy to fight. • The proof of the pudding is in the eating • Pilote is going to be set up • Collaboration sought for dealing with NLU