240 likes | 250 Views
CENDI Staff Workshop Knowledge Organization Systems: Current and Future Uses September 16, 2004. National Library of Medicine. Betsy L. Humphreys Associate Director for Library Operations NLM, NIH, HHS blh@nlm.nih.gov. NLM “Knowledge Organization Systems”.
E N D
CENDI Staff Workshop Knowledge Organization Systems: Current and Future Uses September 16, 2004 National Library of Medicine Betsy L. Humphreys Associate Director for Library Operations NLM, NIH, HHS blh@nlm.nih.gov
NLM “Knowledge Organization Systems” • Name and Series/Journal Authority Files • Library Materials Classification • Individual Controlled Vocabularies • MeSH, MedlinePlus Health Topics, NCBI Taxonomy, RxNorm clinical drug vocabulary • Unified Medical Language System (UMLS) Knowledge Sources • Metathesaurus – many vocabularies in a common, integrated format • Semantic Network • Lexicon • Associated tools
NLM “Knowledge Organization Systems” • Common Characteristics • Searchable on the Web, often interlinked with other NLM resources • Distributed in one or more electronic formats • Used within NLM for: • Information retrieval and display • Data creation • Natural language interpretation • Heavily used outside NLM for wide range of applications • Most built and maintained with custom systems
Medical Subject Headings (MeSH) • Structure of MeSH upgraded in 2000 • Descriptor Class – closely related concepts grouped to enhance retrieval • Concept – distinct meaning • Term – concept name http://www.nlm.nih.gov/mesh/meshrels.html
Known Translations of MeSH • In UMLS - Dutch, Finnish, French, German, Italian, Japanese, Portuguese, Russian, Spanish, Swedish • Other Complete Translations • Arabic, Chinese, Czech, Greek, Thai, Turkish • In Progress or Planned or Hoped For • Korean, Slovenian, Vietnamese, Lithuanian, Polish, Slovakian, Norwegian, Kiswahili
Coordinating Translations How? Single Database - Web Interface Add Language as a Term Property Translated Terms added to Concept Non-English Concepts added to Descriptor
Status of Use • Current Active Groups • German, French, Italian, Vietnamese • Groups Beginning Work with MTMS • Dutch, Finnish, Japanese, Polish, Slovakian • Groups Starting Soon • Czech, Portuguese, Korean, Norwegian, Russian, Spanish
The UMLS in practice • Database • Series of relational files • Interfaces • Web interface: Knowledge Source Server (UMLSKS) • Application programming interfaces(Java and XML-based) • Applications • lvg (lexical programs) • MetamorphoSys (installation and customization) • SOON: Metathesaurus browser The UMLS is not an end-user application
UMLS 3 components • Metathesaurus • Concepts • Inter-concept relationships • Semantic Network • Semantic types • Semantic network relationships • Lexical resources • SPECIALIST Lexicon • Lexical tools
Metathesaurus Source Vocabularies (2004AB) • 134 source vocabularies • 126 contributing concept names • 73 families of vocabularies • multiple translations (e.g., MeSH, ICPC, ICD-10) • variants (American-English equivalents, Australian extension/adaptation) • subsequent editions usually considered distinct families (ICD: 9-10; DSM: IIIR-IV) • Broad coverage of biomedicine • Common presentation
L0000002 A0000005Cephalgia(source 1) S0000003 Metathesaurus Concepts (2004AB) • Concept (> 1M) CUI • Set of synonymousconcept names • Term (> 3.8 M) LUI • Set of normalized names • String (> 4.3M) SUI • Distinct concept name • Atom (> 5.1M) AUI • Concept namein a given source C0000001 L0000001 A0000001headache(source 1) A0000002 headache(source 2) S0000001 A0000003 Headache(source 1) A0000004 Headache(source 2) S0000002
Metathesaurus Relationships • Symbolic relations: ~9 M pairs of concepts • Statistical relations : ~7 M pairs of concepts (co-occurring concepts) • Mapping relations: 100,000 pairs of concepts • Categorization: Relationships between concepts and semantic types from the Semantic Network
Why you might care about the UMLS • Content with applicability outside of biomedicine • Tools generally useful in NLP, datamining • New Metathesaurus Rich Release Format • Potentially useful as format for distribution of any set of vocabularies/ontologies and for robust purpose-specific mappings between such systems • May well lead to development of a variety of tools that can output or ingest the format