1 / 42

Experience with Using the UMLS Semantic Network to Coordinate Controlled Terminologies for a Large Clinical Data Repo

Experience with Using the UMLS Semantic Network to Coordinate Controlled Terminologies for a Large Clinical Data Repository. James J. Cimino Department of Biomedical Informatics Columbia University College of Physicians and Surgeons National Library of Medicine, April 8, 2005. Overview.

nili
Download Presentation

Experience with Using the UMLS Semantic Network to Coordinate Controlled Terminologies for a Large Clinical Data Repo

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Experience with Using the UMLS Semantic Network to Coordinate Controlled Terminologiesfor a Large Clinical Data Repository James J. Cimino Department of Biomedical Informatics Columbia University College of Physicians and Surgeons National Library of Medicine, April 8, 2005

  2. Overview • Background • History • General principles • Empiric observations: Semantic Network in the Medical Entities Dictionary • Lessons to be learned

  3. Clinical Data Architecture • Central repository to collect data from myriad sources • Myriad users of data - some not yet imagined

  4. Medical Logic Modules Clinical Database Alerts & Reminders Database Monitor Results Review Database Interface Administrative Medical Entities Dictionary Research Reformatter Reformatter Reformatter . . . . . . Radiology Discharge Summaries Laboratory New York Presbyterian HospitalClinical Information Systems Architecture

  5. Clinical Data Architecture • Central repository to collect data from myriad sources • Myriad users of data - some not yet imagined • Patient-oriented, not visit oriented, database • Relational, not hierarchical, model • Entity-attribute-value model

  6. Entity-Attribute-Value Clinical Data Repository

  7. Clinical Data Architecture • Central repository to collect data from myriad sources • Myriad users of data - some not yet imagined • Patient-oriented, not visit oriented, database • Relational, not hierarchical, model • Entity-attribute-value model • Coded data wherever possible • Unify terminology

  8. Medical Entities Dictionary: A Central Terminology Repository

  9. Substance Laboratory Specimen Event Chemical Anatomic Substance Plasma Specimen Diagnostic Procedure Substance Sampled Plasma Laboratory Test Laboratory Procedure Has Specimen Carbo- hydrate Bioactive Substance Part of Glucose Substance Measured MED Structure Medical Entity CHEM-7 Plasma Glucose Test

  10. K#1 = 4.2 K#1 = 3.3 K#2 = 3.2 K#1 = 3.0 K#3 = 2.6 K#1 K#3 K#2 Communicating Terminology Changes

  11. K#1 = 4.2 K#1 = 3.3 K#2 = 3.2 K#1 = 3.0 K#3 = 2.6 K K#3 Solution: Hierarchical Integration K#1 K#2

  12. Use of the UMLS in Patient Care James J. Cimino, M.D. Center for Medical Informatics Columbia University Mont Pelerin, Switzerland 1994

  13. UMLS Semantic Network • Strict hierarchy • Semantic types: 132 (135) • Semantic relations: 46 (53) • Inheritance of relations: 6233 (6700)

  14. UMLS Metathesaurus • Terms from 22 (100+) controlled vocabularies • Total source terms: 311,046 • Total strings: 279,237 (5,000,000) • Total concepts: 152,444 (1,000,000) • Relationships: 1,484,994 (16,000,000)

  15. Medical Entities Dictionary • Semantic Network • Sources: 5 • Strings: 108,492 • Concepts: 35,281 • Semantic relations: 23 pairs • Semantic Links: 145,672

  16. Comparisons - Methods • CPMC Entities vs. UMLS Semantic Types • MED Classes vs. UMLS Semantic Types • MED Semantic Links vs. UMLS Semantic Relations • MED Concepts vs. Metathesaurus Concepts • MED Semantic Links vs. Meta Relations

  17. Comparisons - Results CPMC DB Entities Classes Links Concepts Types ++++ +++ U M L S Relations ++ Concepts +++ +/- Meta Links

  18. Summary • Semantic Types provide good coverage • Concepts provide good coverage in certain domains • No technical reason why UMLS could not incorporate clinical vocabulary

  19. Patients: 2.6 million Visits: >10 million since 1996 with archives going back to 1979 Visit diagnoses, locations, procedures, providers, insurance Lab procedures: 16 million with 130 million results (to 1989) Radiology procedures reports: 5.7 million Pathology: 1.4 million Cardiology procedures: 1.5 million Resident signout notes:760,000 Operative Notes: 426,000 Clinical Notes: 400,000 Discharge Summaries: 420000 Medication orders: >60 million ObGyn Procedure Reports: 241,000 GI Procedure Reports: 101,000 Neurology Procedure Reports: 54,000 Ideatel BP’s: 215,000 Ideatel Glucose: 650,000 Consult Events: 18000 HEENT Events:13000 Hospitalist Notes:30000 PFT: 25000 Provider profiles 11000 IDX 1.4 million East Campus Where We Are Today - Repository

  20. Where We Are Today - MED • Domains: 7++ (5) • HP lab terms • Misys lab terms • Cerner lab terms • Misys Radiology • Digimedix drugs • Cerner Drugs • ICD9-based problem list terms • Other applications • Knowledge terms • Size: • Concept-based: 95,641 (35,281) • Multiple hierarchy: 141,306 • Synonyms: 239,581 (108,492) • Translations: 141,717 • Semantic link pairs: 52 (23) • Semantic links: 225,698 (145,672) • Attributes: 210,456

  21. What does this have to do with the SN? • MED was initially based on UMLS design (creationism) • UMLS SN was the “starter set” • MED is “local UMLS” for CPMC • General principles were established • MED has developed without further conscious attention to the SN (evolution) • MED content represents real-world terminology • What follows are empiric observations, open to criticism; perhaps indefensible

  22. General Principles • Everything is a class • Multiple hierarchy • Some relations are definitional • At most, one part of relation pair is definitional • Properties introduced at single points

  23. Observations on the SN in the MED • Arrangement of SN in MED • Multiple hierarchy of STs • Size of ST classes in MED (vs Meta?) • STs as introduction points • Intersections

  24. UMLS Semantic Net in the MED A: T071: Medical Entity [94729] . A1: T072: Physical Object [5618] . +*A1.2: T017: Anatomical Structure [577] . A2: T077: Conceptual Entity [77861] . *B: T051: Event [55450] Key: “A1.2”: UMLS Tree address “T071”: Semantic type ID “Event”: MED Name “+”: Multiple locations “*”: Discontinuous tree address “[577]”: Number of MED concepts

  25. UMLS Semantic Net in the MED A: T071: Medical Entity [94729] . A1: T072: Physical Object [5618] . . A1.1: T001: Organism [3153] . . . A1.1.1: T002: Plant [1] . . . . A1.1.1.1: T003: Alga [0] . . . A1.1.2: T004: Fungus [273] . . . A1.1.3: T005: Virus [169] . . . A1.1.4: T006: Rickettsia or Chlamydia [5] . . . A1.1.5: T007: Bacterium [992] . . . A1.1.6: T194: Archaeon [0] . . . A1.1.7: T008: Animal [93] . . . . A1.1.7.1: T009: Invertebrate [85] . . . . A1.1.7.2: T010: Vertebrate [6] . . . . . A1.1.7.2.1: T011: Amphibian [0] . . . . . A1.1.7.2.2: T012: Bird [0] . . . . . A1.1.7.2.3: T013: Fish [0] . . . . . A1.1.7.2.4: T014: Reptile [0] . . . . . A1.1.7.2.5: T015: Mammal [1] . . . . . . A1.1.7.2.5.1: T016: Human [0] Key: “A1.2”: UMLS Tree address “T071”: Semantic type ID “Event”: MED Name “+”: Multiple locations “*”: Discontinuous tree address “[577]”: Number of MED concepts

  26. UMLS Semantic Net in the MED A: T071: Medical Entity [94729] . +*A1.2: T017: Anatomical Structure [577] . . A1.2.3: T021: Fully Formed Anatomical Structure [230] . . . A1.2.3.1: T023: Body Part, Organ, or Organ Component [204] . . . *A1.2.1: T018: Embryonic Structure [2] . . . *A1.2.2: T190: Anatomical Abnormality [20] . . . . A1.2.2.1: T019: Congenital Abnormality [0] . . . . A1.2.2.2: T020: Acquired Abnormality [18] . . *A1.2.3.2: T024: Tissue [66] . . *A1.2.3.3: T025: Cell [61] . . *A1.2.3.4: T026: Cell Component [11] . . *A1.2.3.5: T028: Gene or Genome [0] . . *A1.4.2: T031: Body Substance [56] . . +*A2.1.4.1: T022: Body System [65] . . +*A2.1.5.1: T030: Body Space or Junction [43] . . +*A2.1.5.2: T029: Body Location or Region [117 . . *A1.3: T073: Manufactured Object [16] . . . A1.3.1: T074: Medical Device [6] . . . A1.3.2: T075: Research Device [0] . . . A1.3.3: T200: Clinical Drug [0] . . A1.4: T167: Substance [???] . . . A1.4.1: T103: Chemical [1942] . . . . A1.4.1.1: T120: Chemical Viewed Functionally [1828] . . . . . A1.4.1.1.1: T121: Pharmacologic Substance [1468] . . . . . . +*A1.4.1.1.3.4: T127: Vitamin [20] . . . . . . A1.4.1.1.1.1: T195: Antibiotic [130] . . . . . A1.4.1.1.3: T123: Biologically Active Substance [530] . . . . . . +A1.4.1.1.3.4: T127: Vitamin [20] Key: “A1.2”: UMLS Tree address “T071”: Semantic type ID “Event”: MED Name “+”: Multiple locations “*”: Discontinuous tree address “[577]”: Number of MED concepts

  27. Property Introduction Points 1: Medical Entirity [T071] MED-CODE UMLS-CODE NAME SUBCLASS-OF -> SUBCLASS (1: Medical Entity [T071]) SUBCLASS -> SUBCLASS-OF (1: Medical Entity [T071]) SYNONYMS PRINT-NAME HAS-PARTS -> PART-OF (1: Medical Entity [T071]) PART-OF -> HAS-PARTS (1: Medical Entity [T071]) DEFINITION MAIN-MESH SUPPLEMENTARY-MESH NAME-TOKEN DEFAULT-SHORT-DISPLAY-NAME DEFAULT-DISPLAY-NAME SPEECH-SYNONYM SPEECH-SYNTHESIS-NAME ENTITY-(HAS-RELATED)-PAGER-NUMBER ENTITY-(HAS)-MEDLEE-TARGET-TERM HIERARCHY-SELECTOR

  28. Medical Properties 7: Body System [T022] ACTION-SITE-OF -> ACTION-SITE (98: Health Care Activity (Procedure) [T058]) 14: Anatomical Structure [T017] SITE-OF-PROBLEM -> HAS-PROBLEM-SITE (30007: Patient Problem) OBSERVATION-SITE-OF -> OBSERVATION-SITE (94: Diagnostic Procedure [T060]) 43: Chemical [T103] PHARMACEUTIC-COMPONENT-OF -> PHARMACEUTIC-COMPONENT (28103: Pharmacy Items (Drugs and Nondrugs)) 50: Measureable Entity MEASURED-BY-PROCEDURE -> ENTITY-MEASURED (64964: Assessment Procedures) LOINC-ANALYTE-NAME 76: Disease or Syndrome [C0391828] ETIOLOGY -> CAUSES-DISEASES (135: Etiologic Agent) IS-HISTORIC-DISEASE-FOR -> HISTORIC-DISEASE (56164: Factors Related to Past Disease Influencing Health Status)

  29. Medical Properties 83: Laboratory Finding or Test Result [T034] RESULT-TYPE-->TESTS -> TEST-->RESULT-TYPE (94: Diagnostic Procedure [T060]) 86: Finding [T033] FINDING-(REFERS-TO)->ORGANISM 93: Laboratory Diagnostic Procedure COLLECTED-BY -> COLLECTED-FOR (33023: Specimen Collection [C0200345]) 94: Diagnostic Procedure [T060] UNITS TEST-->RESULT-TYPE -> RESULT-TYPE-->TESTS (83: Laboratory Finding or Test Result [T034]) OBSERVATION-SITE -> OBSERVATION-SITE-OF (14: Anatomical Structure [T017]) TEST-(HAS)-ABNORMAL-FLAG -> ABNORMAL-FLAG-(FOR)-TEST (77746: Abnormal Flag Value) 98: Health Care Activity (Procedure) [T058] PROCEDURE-(INDICATES)->PT-PROBLEM -> PT-PROBLEM-(INDICATED-BY)->PROCEDURE (30007: Patient Problem) ACTION-SITE -> ACTION-SITE-OF (7: Body System [T022])

  30. Medical Properties 135: Etiologic Agent CAUSES-DISEASES -> ETIOLOGY (76: Disease or Syndrome [C0391828]) 1181: Antibiotic Sensitivity Tests SENSITIVITY-ANALYTE -> SENSITIVITY-ANALYTE-OF (44440: Antibiotic or Bacterial Enzyme Inhibitor) 32291: Sampleable Entity SAMPLED-BY -> SYSTEM-SAMPLED (64970: Sample Entity) LOINC-SYSTEM-CODE 44440: Antibiotic or Bacterial Enzyme Inhibitor SENSITIVITY-ANALYTE-OF -> SENSITIVITY-ANALYTE (1181: Antibiotic Sensitivity Tests)

  31. Data Dictionary Properties 59511: Clinical Repository Table TABLE-HAS-COLUMN -> COLUMN-IS-IN-TABLE (59512: Clinical Repository Column) 59512: Clinical Repository Column COLUMN-IS-IN-TABLE -> TABLE-HAS-COLUMN (59511: Clinical Repository Table) 59528: Generic Column COLUMN-HAS-PERMITTED-VALUES -> IS-PERMITTED-VALUE-FOR-COLUMN (67164: Verification Concept for Generic Column) 59729: Data Entry Form Component REPEAT-TYPE(DATA-ENTRY-COMPONENT) NUMBER-REPEATS(DATA-ENTRY-COMPONENT) REPEAT-LAYOUT-TYPE(DATA-ENTRY-COMPONENT) 59732: Form Field Allowable Values ALLOWABLE-VALUE-(FOR)->DATA-ENTRY-FIELD -> DATA-ENTRY-FIELD-(HAS)->ALLOWABLE-VALUE (42646: Data Entry Form Field)

  32. Controlled Terminology Properties 21762: ICD9 Element ICD9-CODE ICD9-ENTRY-CODE OLD-ICD9-CODE ICD9-NAME 23147: American Hospital Formulary Service Class AHFS-CLASS-CODE 28104: Drug Enforcement Administration (DEA) Controlled Substance Category DEA-CODE

  33. Data Modeling Properties 1178: Number or String Result EVENT-ID-OF -> EVENT-ID (9876: CPMC Event) EVENT-PATIENT-ID-OF -> EVENT-PATIENT-ID (9876: CPMC Event) EVENT-ORGANIZATION-OF -> EVENT-ORGANIZATION (9876: CPMC Event) EVENT-LOCATION-OF -> EVENT-LOCATION (9876: CPMC Event) PARTICIPANT-ID-OF -> PARTICIPANT-ID (30352: Medical Event Participant) 9876: CPMC Event EVENT-ID -> EVENT-ID-OF (1178: Number or String Result) EVENT-DATE -> EVENT-DATE-OF (30349: Date Result) EVENT-PATIENT-ID -> EVENT-PATIENT-ID-OF (1178: Number or String Result) EVENT-PARTICIPANT -> PARTICIPANT-OF (30352: Medical Event Participant) EVENT-ORGANIZATION -> EVENT-ORGANIZATION-OF (1178: Number or String Result) EVENT-LOCATION -> EVENT-LOCATION-OF (1178: Number or String Result) EVENT-STATUS -> STATUS-OF (30355: CPMC Status Term) EVENT-(HAS)-ORGANIZATION -> ORGANIZATION-(FOR)-EVENT (81475: CPMC Coded Organizations) 30344: CPMC Order ORDER-QUANTITY -> ORDER-QUANTITY-OF (30350: Quantity Result) ORDER-FREQUENCY -> ORDER-FREQUENCY-OF (32504: Order Frequency) ORDER-START-DATE -> ORDER-START-DATE-OF (30349: Date Result) ORDER-STOP-DATE -> ORDER-STOP-DATE-OF (30349: Date Result) 30352: Medical Event Participant PARTICIPANT-OF -> EVENT-PARTICIPANT (9876: CPMC Event) PARTICIPANT-ID -> PARTICIPANT-ID-OF (1178: Number or String Result) PARTICIPANT-NAME -> PARTICIPANT-NAME-OF (32653: ID Number Plus Text Result)

  34. Application Properties 40441: Display Information [C0010996] DEFAULT-DISPLAY-FOR -> HAS-DEFAULT-DISPLAYS (94: Diagnostic Procedure [T060]) DISPLAYS-ELEMENTS-OF -> ELEMENTS-DISPLAYED-BY (94: Diagnostic Procedure [T060]) HAS-DISPLAY-PARAMETERS -> IS-DISPLAY-PARAMETER-OF (94: Diagnostic Procedure [T060]) DISPLAY-PARAMETER-ORDER

  35. Document Properties 42645: Data Entry Form FORM-(IS-PART-OF)->FORMSET -> FORMSET-(CONTAINS)->FORM (66436: Data Entry Form Sets) 42646: Data Entry Form Field DATA-ENTRY-FIELD-(HAS)->ALLOWABLE-VALUE -> ALLOWABLE-VALUE-(FOR)->DATA-ENTRY-FIELD (59732: Form Field Allowable Values) FORM-FIELD-(HAS)->FIELD-TYPE -> FIELD-TYPE-(FOR)->FORM-FIELD (66295: Data Entry Field Type) FORM-FIELD-(OBEYS)->PREFILL-RULE -> PREFILL-RULE-(FOR)->FORM-FIELD (66311: Prefill Rules) FORM-FIELD-MAXIMUM-VALUE FORM-FIELD-MINIMUM-VALUE FORM-FIELD-MAXIMUM-CHARACTER-COUNT 59732: Form Field Allowable Values ALLOWABLE-VALUE-(FOR)->DATA-ENTRY-FIELD -> DATA-ENTRY-FIELD-(HAS)->ALLOWABLE-VALUE (42646: Data Entry Form Field) 66295: Data Entry Field Type FIELD-TYPE-(FOR)->FORM-FIELD -> FORM-FIELD-(HAS)->FIELD-TYPE (42646: Data Entry Form Field) 66308: Layout Type LAYOUT-TYPE-(FOR)->FORM-STRUCTURE -> FORM-STRUCTURE-(HAS)->LAYOUT-TYPE (66405: Data Entry Form Structure)

  36. 207 Intersection Classes Chemical [T103] Measureable Entity Etiologic Agent 1780 cases. Measureable Entity Laboratory Finding or Test Result [T034] Finding [T033] Etiologic Agent Microbiology Result Patient Problem Laboratory Results 1399 cases. Laboratory Finding or Test Result [T034] Finding [T033] Patient Problem Laboratory Results 3309 cases. Laboratory Finding or Test Result [T034] Finding [T033] Patient Problem Laboratory Results New York Hospital (NYH) Laboratory Nomenclature Term 1601 cases.

  37. 207 Intersection Classes Laboratory Finding or Test Result [T034] Finding [T033] Patient Problem New York Hospital (NYH) Laboratory Nomenclature Term 2906 cases. Laboratory Diagnostic Procedure Diagnostic Procedure [T060] Health Care Activity (Procedure) [T058] Event [T051] Laboratory Diagnostic Batteries Single-Result Laboratory Test New York Hospital (NYH) Laboratory Concept Assessment Procedures 1197 cases. Laboratory Diagnostic Procedure Diagnostic Procedure [T060] Health Care Activity (Procedure) [T058] Event [T051] Laboratory Diagnostic Batteries New York Hospital (NYH) Laboratory Concept Assessment Procedures 1822 cases.

  38. 207 Intersection Classes Laboratory Diagnostic Procedure Diagnostic Procedure [T060] Health Care Activity (Procedure) [T058] Event [T051] Single-Result Laboratory Test New York Hospital (NYH) Laboratory Concept Assessment Procedures 3200 cases. Laboratory Diagnostic Procedure Diagnostic Procedure [T060] Health Care Activity (Procedure) [T058] Event [T051] Single-Result Laboratory Test CPMC Single-Result Laboratory Test Assessment Procedures 3197 cases. Health Care Activity (Procedure) [T058] Event [T051] ICD9 Element Verification Concept for Generic Column 10048 cases.

  39. Revisiting Recommendations - General • Make “Event” a temporal concept • Conceptual vs. Physical polarization • Directed Acyclic Graph • Merge Network and Metathesaurus

  40. Revisiting Recommendations - Specific • Tests have Specimens • Tests have Parts • Separate Medications from Chemicals • Liberalize assignment of Relations

  41. Revisiting Summary • Semantic Types provide good coverage • Concepts provide good coverage in certain domains • No technical reason why UMLS could not incorporate clinical vocabulary

  42. Lessons to be Learned • The MED is representative of clinical care • MED classes work well as introduction points • Multiple hierarchy works • Semantic Network is largely intact • Unifying organization for anatomy needed • Further study of MED will suggest additional types and relations

More Related