420 likes | 551 Views
Experience with Using the UMLS Semantic Network to Coordinate Controlled Terminologies for a Large Clinical Data Repository. James J. Cimino Department of Biomedical Informatics Columbia University College of Physicians and Surgeons National Library of Medicine, April 8, 2005. Overview.
E N D
Experience with Using the UMLS Semantic Network to Coordinate Controlled Terminologiesfor a Large Clinical Data Repository James J. Cimino Department of Biomedical Informatics Columbia University College of Physicians and Surgeons National Library of Medicine, April 8, 2005
Overview • Background • History • General principles • Empiric observations: Semantic Network in the Medical Entities Dictionary • Lessons to be learned
Clinical Data Architecture • Central repository to collect data from myriad sources • Myriad users of data - some not yet imagined
Medical Logic Modules Clinical Database Alerts & Reminders Database Monitor Results Review Database Interface Administrative Medical Entities Dictionary Research Reformatter Reformatter Reformatter . . . . . . Radiology Discharge Summaries Laboratory New York Presbyterian HospitalClinical Information Systems Architecture
Clinical Data Architecture • Central repository to collect data from myriad sources • Myriad users of data - some not yet imagined • Patient-oriented, not visit oriented, database • Relational, not hierarchical, model • Entity-attribute-value model
Clinical Data Architecture • Central repository to collect data from myriad sources • Myriad users of data - some not yet imagined • Patient-oriented, not visit oriented, database • Relational, not hierarchical, model • Entity-attribute-value model • Coded data wherever possible • Unify terminology
Medical Entities Dictionary: A Central Terminology Repository
Substance Laboratory Specimen Event Chemical Anatomic Substance Plasma Specimen Diagnostic Procedure Substance Sampled Plasma Laboratory Test Laboratory Procedure Has Specimen Carbo- hydrate Bioactive Substance Part of Glucose Substance Measured MED Structure Medical Entity CHEM-7 Plasma Glucose Test
K#1 = 4.2 K#1 = 3.3 K#2 = 3.2 K#1 = 3.0 K#3 = 2.6 K#1 K#3 K#2 Communicating Terminology Changes
K#1 = 4.2 K#1 = 3.3 K#2 = 3.2 K#1 = 3.0 K#3 = 2.6 K K#3 Solution: Hierarchical Integration K#1 K#2
Use of the UMLS in Patient Care James J. Cimino, M.D. Center for Medical Informatics Columbia University Mont Pelerin, Switzerland 1994
UMLS Semantic Network • Strict hierarchy • Semantic types: 132 (135) • Semantic relations: 46 (53) • Inheritance of relations: 6233 (6700)
UMLS Metathesaurus • Terms from 22 (100+) controlled vocabularies • Total source terms: 311,046 • Total strings: 279,237 (5,000,000) • Total concepts: 152,444 (1,000,000) • Relationships: 1,484,994 (16,000,000)
Medical Entities Dictionary • Semantic Network • Sources: 5 • Strings: 108,492 • Concepts: 35,281 • Semantic relations: 23 pairs • Semantic Links: 145,672
Comparisons - Methods • CPMC Entities vs. UMLS Semantic Types • MED Classes vs. UMLS Semantic Types • MED Semantic Links vs. UMLS Semantic Relations • MED Concepts vs. Metathesaurus Concepts • MED Semantic Links vs. Meta Relations
Comparisons - Results CPMC DB Entities Classes Links Concepts Types ++++ +++ U M L S Relations ++ Concepts +++ +/- Meta Links
Summary • Semantic Types provide good coverage • Concepts provide good coverage in certain domains • No technical reason why UMLS could not incorporate clinical vocabulary
Patients: 2.6 million Visits: >10 million since 1996 with archives going back to 1979 Visit diagnoses, locations, procedures, providers, insurance Lab procedures: 16 million with 130 million results (to 1989) Radiology procedures reports: 5.7 million Pathology: 1.4 million Cardiology procedures: 1.5 million Resident signout notes:760,000 Operative Notes: 426,000 Clinical Notes: 400,000 Discharge Summaries: 420000 Medication orders: >60 million ObGyn Procedure Reports: 241,000 GI Procedure Reports: 101,000 Neurology Procedure Reports: 54,000 Ideatel BP’s: 215,000 Ideatel Glucose: 650,000 Consult Events: 18000 HEENT Events:13000 Hospitalist Notes:30000 PFT: 25000 Provider profiles 11000 IDX 1.4 million East Campus Where We Are Today - Repository
Where We Are Today - MED • Domains: 7++ (5) • HP lab terms • Misys lab terms • Cerner lab terms • Misys Radiology • Digimedix drugs • Cerner Drugs • ICD9-based problem list terms • Other applications • Knowledge terms • Size: • Concept-based: 95,641 (35,281) • Multiple hierarchy: 141,306 • Synonyms: 239,581 (108,492) • Translations: 141,717 • Semantic link pairs: 52 (23) • Semantic links: 225,698 (145,672) • Attributes: 210,456
What does this have to do with the SN? • MED was initially based on UMLS design (creationism) • UMLS SN was the “starter set” • MED is “local UMLS” for CPMC • General principles were established • MED has developed without further conscious attention to the SN (evolution) • MED content represents real-world terminology • What follows are empiric observations, open to criticism; perhaps indefensible
General Principles • Everything is a class • Multiple hierarchy • Some relations are definitional • At most, one part of relation pair is definitional • Properties introduced at single points
Observations on the SN in the MED • Arrangement of SN in MED • Multiple hierarchy of STs • Size of ST classes in MED (vs Meta?) • STs as introduction points • Intersections
UMLS Semantic Net in the MED A: T071: Medical Entity [94729] . A1: T072: Physical Object [5618] . +*A1.2: T017: Anatomical Structure [577] . A2: T077: Conceptual Entity [77861] . *B: T051: Event [55450] Key: “A1.2”: UMLS Tree address “T071”: Semantic type ID “Event”: MED Name “+”: Multiple locations “*”: Discontinuous tree address “[577]”: Number of MED concepts
UMLS Semantic Net in the MED A: T071: Medical Entity [94729] . A1: T072: Physical Object [5618] . . A1.1: T001: Organism [3153] . . . A1.1.1: T002: Plant [1] . . . . A1.1.1.1: T003: Alga [0] . . . A1.1.2: T004: Fungus [273] . . . A1.1.3: T005: Virus [169] . . . A1.1.4: T006: Rickettsia or Chlamydia [5] . . . A1.1.5: T007: Bacterium [992] . . . A1.1.6: T194: Archaeon [0] . . . A1.1.7: T008: Animal [93] . . . . A1.1.7.1: T009: Invertebrate [85] . . . . A1.1.7.2: T010: Vertebrate [6] . . . . . A1.1.7.2.1: T011: Amphibian [0] . . . . . A1.1.7.2.2: T012: Bird [0] . . . . . A1.1.7.2.3: T013: Fish [0] . . . . . A1.1.7.2.4: T014: Reptile [0] . . . . . A1.1.7.2.5: T015: Mammal [1] . . . . . . A1.1.7.2.5.1: T016: Human [0] Key: “A1.2”: UMLS Tree address “T071”: Semantic type ID “Event”: MED Name “+”: Multiple locations “*”: Discontinuous tree address “[577]”: Number of MED concepts
UMLS Semantic Net in the MED A: T071: Medical Entity [94729] . +*A1.2: T017: Anatomical Structure [577] . . A1.2.3: T021: Fully Formed Anatomical Structure [230] . . . A1.2.3.1: T023: Body Part, Organ, or Organ Component [204] . . . *A1.2.1: T018: Embryonic Structure [2] . . . *A1.2.2: T190: Anatomical Abnormality [20] . . . . A1.2.2.1: T019: Congenital Abnormality [0] . . . . A1.2.2.2: T020: Acquired Abnormality [18] . . *A1.2.3.2: T024: Tissue [66] . . *A1.2.3.3: T025: Cell [61] . . *A1.2.3.4: T026: Cell Component [11] . . *A1.2.3.5: T028: Gene or Genome [0] . . *A1.4.2: T031: Body Substance [56] . . +*A2.1.4.1: T022: Body System [65] . . +*A2.1.5.1: T030: Body Space or Junction [43] . . +*A2.1.5.2: T029: Body Location or Region [117 . . *A1.3: T073: Manufactured Object [16] . . . A1.3.1: T074: Medical Device [6] . . . A1.3.2: T075: Research Device [0] . . . A1.3.3: T200: Clinical Drug [0] . . A1.4: T167: Substance [???] . . . A1.4.1: T103: Chemical [1942] . . . . A1.4.1.1: T120: Chemical Viewed Functionally [1828] . . . . . A1.4.1.1.1: T121: Pharmacologic Substance [1468] . . . . . . +*A1.4.1.1.3.4: T127: Vitamin [20] . . . . . . A1.4.1.1.1.1: T195: Antibiotic [130] . . . . . A1.4.1.1.3: T123: Biologically Active Substance [530] . . . . . . +A1.4.1.1.3.4: T127: Vitamin [20] Key: “A1.2”: UMLS Tree address “T071”: Semantic type ID “Event”: MED Name “+”: Multiple locations “*”: Discontinuous tree address “[577]”: Number of MED concepts
Property Introduction Points 1: Medical Entirity [T071] MED-CODE UMLS-CODE NAME SUBCLASS-OF -> SUBCLASS (1: Medical Entity [T071]) SUBCLASS -> SUBCLASS-OF (1: Medical Entity [T071]) SYNONYMS PRINT-NAME HAS-PARTS -> PART-OF (1: Medical Entity [T071]) PART-OF -> HAS-PARTS (1: Medical Entity [T071]) DEFINITION MAIN-MESH SUPPLEMENTARY-MESH NAME-TOKEN DEFAULT-SHORT-DISPLAY-NAME DEFAULT-DISPLAY-NAME SPEECH-SYNONYM SPEECH-SYNTHESIS-NAME ENTITY-(HAS-RELATED)-PAGER-NUMBER ENTITY-(HAS)-MEDLEE-TARGET-TERM HIERARCHY-SELECTOR
Medical Properties 7: Body System [T022] ACTION-SITE-OF -> ACTION-SITE (98: Health Care Activity (Procedure) [T058]) 14: Anatomical Structure [T017] SITE-OF-PROBLEM -> HAS-PROBLEM-SITE (30007: Patient Problem) OBSERVATION-SITE-OF -> OBSERVATION-SITE (94: Diagnostic Procedure [T060]) 43: Chemical [T103] PHARMACEUTIC-COMPONENT-OF -> PHARMACEUTIC-COMPONENT (28103: Pharmacy Items (Drugs and Nondrugs)) 50: Measureable Entity MEASURED-BY-PROCEDURE -> ENTITY-MEASURED (64964: Assessment Procedures) LOINC-ANALYTE-NAME 76: Disease or Syndrome [C0391828] ETIOLOGY -> CAUSES-DISEASES (135: Etiologic Agent) IS-HISTORIC-DISEASE-FOR -> HISTORIC-DISEASE (56164: Factors Related to Past Disease Influencing Health Status)
Medical Properties 83: Laboratory Finding or Test Result [T034] RESULT-TYPE-->TESTS -> TEST-->RESULT-TYPE (94: Diagnostic Procedure [T060]) 86: Finding [T033] FINDING-(REFERS-TO)->ORGANISM 93: Laboratory Diagnostic Procedure COLLECTED-BY -> COLLECTED-FOR (33023: Specimen Collection [C0200345]) 94: Diagnostic Procedure [T060] UNITS TEST-->RESULT-TYPE -> RESULT-TYPE-->TESTS (83: Laboratory Finding or Test Result [T034]) OBSERVATION-SITE -> OBSERVATION-SITE-OF (14: Anatomical Structure [T017]) TEST-(HAS)-ABNORMAL-FLAG -> ABNORMAL-FLAG-(FOR)-TEST (77746: Abnormal Flag Value) 98: Health Care Activity (Procedure) [T058] PROCEDURE-(INDICATES)->PT-PROBLEM -> PT-PROBLEM-(INDICATED-BY)->PROCEDURE (30007: Patient Problem) ACTION-SITE -> ACTION-SITE-OF (7: Body System [T022])
Medical Properties 135: Etiologic Agent CAUSES-DISEASES -> ETIOLOGY (76: Disease or Syndrome [C0391828]) 1181: Antibiotic Sensitivity Tests SENSITIVITY-ANALYTE -> SENSITIVITY-ANALYTE-OF (44440: Antibiotic or Bacterial Enzyme Inhibitor) 32291: Sampleable Entity SAMPLED-BY -> SYSTEM-SAMPLED (64970: Sample Entity) LOINC-SYSTEM-CODE 44440: Antibiotic or Bacterial Enzyme Inhibitor SENSITIVITY-ANALYTE-OF -> SENSITIVITY-ANALYTE (1181: Antibiotic Sensitivity Tests)
Data Dictionary Properties 59511: Clinical Repository Table TABLE-HAS-COLUMN -> COLUMN-IS-IN-TABLE (59512: Clinical Repository Column) 59512: Clinical Repository Column COLUMN-IS-IN-TABLE -> TABLE-HAS-COLUMN (59511: Clinical Repository Table) 59528: Generic Column COLUMN-HAS-PERMITTED-VALUES -> IS-PERMITTED-VALUE-FOR-COLUMN (67164: Verification Concept for Generic Column) 59729: Data Entry Form Component REPEAT-TYPE(DATA-ENTRY-COMPONENT) NUMBER-REPEATS(DATA-ENTRY-COMPONENT) REPEAT-LAYOUT-TYPE(DATA-ENTRY-COMPONENT) 59732: Form Field Allowable Values ALLOWABLE-VALUE-(FOR)->DATA-ENTRY-FIELD -> DATA-ENTRY-FIELD-(HAS)->ALLOWABLE-VALUE (42646: Data Entry Form Field)
Controlled Terminology Properties 21762: ICD9 Element ICD9-CODE ICD9-ENTRY-CODE OLD-ICD9-CODE ICD9-NAME 23147: American Hospital Formulary Service Class AHFS-CLASS-CODE 28104: Drug Enforcement Administration (DEA) Controlled Substance Category DEA-CODE
Data Modeling Properties 1178: Number or String Result EVENT-ID-OF -> EVENT-ID (9876: CPMC Event) EVENT-PATIENT-ID-OF -> EVENT-PATIENT-ID (9876: CPMC Event) EVENT-ORGANIZATION-OF -> EVENT-ORGANIZATION (9876: CPMC Event) EVENT-LOCATION-OF -> EVENT-LOCATION (9876: CPMC Event) PARTICIPANT-ID-OF -> PARTICIPANT-ID (30352: Medical Event Participant) 9876: CPMC Event EVENT-ID -> EVENT-ID-OF (1178: Number or String Result) EVENT-DATE -> EVENT-DATE-OF (30349: Date Result) EVENT-PATIENT-ID -> EVENT-PATIENT-ID-OF (1178: Number or String Result) EVENT-PARTICIPANT -> PARTICIPANT-OF (30352: Medical Event Participant) EVENT-ORGANIZATION -> EVENT-ORGANIZATION-OF (1178: Number or String Result) EVENT-LOCATION -> EVENT-LOCATION-OF (1178: Number or String Result) EVENT-STATUS -> STATUS-OF (30355: CPMC Status Term) EVENT-(HAS)-ORGANIZATION -> ORGANIZATION-(FOR)-EVENT (81475: CPMC Coded Organizations) 30344: CPMC Order ORDER-QUANTITY -> ORDER-QUANTITY-OF (30350: Quantity Result) ORDER-FREQUENCY -> ORDER-FREQUENCY-OF (32504: Order Frequency) ORDER-START-DATE -> ORDER-START-DATE-OF (30349: Date Result) ORDER-STOP-DATE -> ORDER-STOP-DATE-OF (30349: Date Result) 30352: Medical Event Participant PARTICIPANT-OF -> EVENT-PARTICIPANT (9876: CPMC Event) PARTICIPANT-ID -> PARTICIPANT-ID-OF (1178: Number or String Result) PARTICIPANT-NAME -> PARTICIPANT-NAME-OF (32653: ID Number Plus Text Result)
Application Properties 40441: Display Information [C0010996] DEFAULT-DISPLAY-FOR -> HAS-DEFAULT-DISPLAYS (94: Diagnostic Procedure [T060]) DISPLAYS-ELEMENTS-OF -> ELEMENTS-DISPLAYED-BY (94: Diagnostic Procedure [T060]) HAS-DISPLAY-PARAMETERS -> IS-DISPLAY-PARAMETER-OF (94: Diagnostic Procedure [T060]) DISPLAY-PARAMETER-ORDER
Document Properties 42645: Data Entry Form FORM-(IS-PART-OF)->FORMSET -> FORMSET-(CONTAINS)->FORM (66436: Data Entry Form Sets) 42646: Data Entry Form Field DATA-ENTRY-FIELD-(HAS)->ALLOWABLE-VALUE -> ALLOWABLE-VALUE-(FOR)->DATA-ENTRY-FIELD (59732: Form Field Allowable Values) FORM-FIELD-(HAS)->FIELD-TYPE -> FIELD-TYPE-(FOR)->FORM-FIELD (66295: Data Entry Field Type) FORM-FIELD-(OBEYS)->PREFILL-RULE -> PREFILL-RULE-(FOR)->FORM-FIELD (66311: Prefill Rules) FORM-FIELD-MAXIMUM-VALUE FORM-FIELD-MINIMUM-VALUE FORM-FIELD-MAXIMUM-CHARACTER-COUNT 59732: Form Field Allowable Values ALLOWABLE-VALUE-(FOR)->DATA-ENTRY-FIELD -> DATA-ENTRY-FIELD-(HAS)->ALLOWABLE-VALUE (42646: Data Entry Form Field) 66295: Data Entry Field Type FIELD-TYPE-(FOR)->FORM-FIELD -> FORM-FIELD-(HAS)->FIELD-TYPE (42646: Data Entry Form Field) 66308: Layout Type LAYOUT-TYPE-(FOR)->FORM-STRUCTURE -> FORM-STRUCTURE-(HAS)->LAYOUT-TYPE (66405: Data Entry Form Structure)
207 Intersection Classes Chemical [T103] Measureable Entity Etiologic Agent 1780 cases. Measureable Entity Laboratory Finding or Test Result [T034] Finding [T033] Etiologic Agent Microbiology Result Patient Problem Laboratory Results 1399 cases. Laboratory Finding or Test Result [T034] Finding [T033] Patient Problem Laboratory Results 3309 cases. Laboratory Finding or Test Result [T034] Finding [T033] Patient Problem Laboratory Results New York Hospital (NYH) Laboratory Nomenclature Term 1601 cases.
207 Intersection Classes Laboratory Finding or Test Result [T034] Finding [T033] Patient Problem New York Hospital (NYH) Laboratory Nomenclature Term 2906 cases. Laboratory Diagnostic Procedure Diagnostic Procedure [T060] Health Care Activity (Procedure) [T058] Event [T051] Laboratory Diagnostic Batteries Single-Result Laboratory Test New York Hospital (NYH) Laboratory Concept Assessment Procedures 1197 cases. Laboratory Diagnostic Procedure Diagnostic Procedure [T060] Health Care Activity (Procedure) [T058] Event [T051] Laboratory Diagnostic Batteries New York Hospital (NYH) Laboratory Concept Assessment Procedures 1822 cases.
207 Intersection Classes Laboratory Diagnostic Procedure Diagnostic Procedure [T060] Health Care Activity (Procedure) [T058] Event [T051] Single-Result Laboratory Test New York Hospital (NYH) Laboratory Concept Assessment Procedures 3200 cases. Laboratory Diagnostic Procedure Diagnostic Procedure [T060] Health Care Activity (Procedure) [T058] Event [T051] Single-Result Laboratory Test CPMC Single-Result Laboratory Test Assessment Procedures 3197 cases. Health Care Activity (Procedure) [T058] Event [T051] ICD9 Element Verification Concept for Generic Column 10048 cases.
Revisiting Recommendations - General • Make “Event” a temporal concept • Conceptual vs. Physical polarization • Directed Acyclic Graph • Merge Network and Metathesaurus
Revisiting Recommendations - Specific • Tests have Specimens • Tests have Parts • Separate Medications from Chemicals • Liberalize assignment of Relations
Revisiting Summary • Semantic Types provide good coverage • Concepts provide good coverage in certain domains • No technical reason why UMLS could not incorporate clinical vocabulary
Lessons to be Learned • The MED is representative of clinical care • MED classes work well as introduction points • Multiple hierarchy works • Semantic Network is largely intact • Unifying organization for anatomy needed • Further study of MED will suggest additional types and relations