1 / 25

The UMLS and the Semantic Web

W3C Semantic Web Health Care and Life Sciences Interest Group BioRDF Teleconference September 22, 2008. The UMLS and the Semantic Web. Olivier Bodenreider Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA. Outline. The UMLS (in a nutshell)

jgraves
Download Presentation

The UMLS and the Semantic Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. W3C Semantic WebHealth Care and Life Sciences Interest Group BioRDF Teleconference September 22, 2008 The UMLS and the Semantic Web Olivier Bodenreider Lister Hill National Centerfor Biomedical CommunicationsBethesda, Maryland - USA

  2. Outline • The UMLS (in a nutshell) • Lexical resources • Metathesaurus • Semantic Network • Why is the UMLS relevant to the Semantic Web? • Issues and challenges

  3. Unified Medical Language System (UMLS)

  4. Lexical resources Terminological resources Ontological resources UMLS: 3 components • SPECIALIST Lexicon • 200,000 lexical items • Part of speech and variant information • Metathesaurus • 5M names from over 100 terminologies • 1M concepts • 16M relations • Semantic Network • 135 high-level categories • 7000 relations among them

  5. UMLS Characteristics (1) • Current version: 2008AA (2-3 annual releases) • Type: Terminology integration system • Domain: Biomedicine • Developer: NLM • Funding: NLM (intramural) • Availability • Publicly available: Yes* (cost-free license required) • Repositories: UMLS • URL: http://umlsks.nlm.nih.gov/

  6. UMLS Characteristics (2) • Number of • Concepts: 1.5M (2008AA) • Terms: ~6M • Major organizing principles (Metathesaurus): • Concept orientation • Source transparency • Multi-lingual through translation • Formalism: Proprietary format (RRF)

  7. Clinical repositories Geneticknowledge bases Other subdomains SNOMED CT OMIM … Biomedical literature MeSH NCBI Taxonomy GO Model organisms FMA Genome annotations Anatomy UMLS Integrating subdomains UMLS

  8. Geneticknowledge bases Other subdomains OMIM … NCBI Taxonomy GO Model organisms FMA Genome annotations Anatomy Trans-namespace integration Clinical repositories Addison's disease (363732003) SNOMED CT UMLS UMLS Biomedical literature MeSH C0001403 Addison Disease (D000224)

  9. Semantic Types Anatomical Structure Fully Formed Anatomical Structure Embryonic Structure Disease or Syndrome Body Part, Organ or Organ Component Semantic Network Pharmacologic Substance Population Group Metathesaurus Medias-tinum Saccular Viscus 4 Angina Pectoris 97 Esophagus 12 Heart Cardiotonic Agents 225 Left PhrenicNerve Tissue Donors Heart Valves Fetal Heart 22 9 31 Concepts

  10. Why is the UMLS relevantto the Semantic Web?

  11. Relevance to the SW Metathesaurus • Terminology integration system • Trans-namespace integration • Integration beyond shared identifiers • Repository of biomedical terminologies/ontologies • Many UMLS vocabularies used for the annotation of datasets (including clinical records)

  12. Relevance to the SW Metathesaurus • Broad coverage of biomedicine • Large user base • Tooling available • E.g, visualization, named entity recognition, etc.

  13. Relevance to the SW Semantic Network • Top-level ontology of the biomedical domain • Broad biomedical categories • Helps partition biomedical concepts • Semantic relations

  14. Issues and Challenges

  15. Issues and challenges • Availability • Mandatory license agreement • Discoverability • No metadata • Formalism • No easy conversion to SKOS/RDF(S)/OWL • Identifiers • Steep learning curve

  16. Availability • Some source vocabularies have intellectual property restrictions • E.g., most drug vocabularies • Complex agreement for SNOMED CT: available at no cost for member countries of the IHTSDO • Mandatory license agreement • No cost for research • May require negotiation with the vocabulary developer for production applications • MetamorphoSys helps extract selected sources from the UMLS

  17. Discoverability • Discoverability of individual concepts • UMLSKS web services • Search all UMLS source vocabularies at the same time • Named entity recognition/normalization (e.g., MetaMap) • Discoverability of terminologies/ontologies • No comprehensive registries • No rich registries • With rich metadata supporting the discoverability of terminologies/ontologies

  18. Formalism • UMLS: Proprietary format • Rich Release Format (RRF) • All terminologies/ontologies represented in the same format • No easy conversion to SKOS/RDF(S)/OWL • Underspecified semantics • Child/parent  subClassOf • Complex semantics • Descriptors / concepts / terms • Rich attribute set

  19. Identifiers for biomedical entities • What is identified? • Entity vs. resource about the entity • Which identifier to pick? • E.g., Addison’s disease • 363732003 (SNOMED CT) • D000224 (MeSH) • C0001403 (UMLS Metathesaurus) • Which format? • URI vs. LSID • Which authoritative source for minting URIs? • Ontology developers vs. (e.g.) Bio2RDF

  20. Steep learning curve • Large resource • 1.5M concepts • 6M terms • Over 20M relations • Complex structure • Metathesaurus • Semantic Network • Rich set of attributes • Rich set of relations • Terminological • Semantic • Statistical • Mapping • Multiple languages • Complex domain

  21. Conclusions

  22. Conclusions • UMLS as a terminology integration system • Helps bridge across namespaces • Helps integrate information sources • Beyond shared identifiers • UMLS as a repository of terminologies/ontologies • Single source, single format for 143 vocabularies • Issues with availability, discoverability and formalism • Identifiers for biomedical entities

  23. References • UMLSumlsinfo.nlm.nih.gov • UMLS browsers (free, but UMLS license required) • Knowledge Source Server: umlsks.nlm.nih.gov • Semantic Navigator: http://mor.nlm.nih.gov/perl/semnav.pl • RRF browser(standalone application distributed with the UMLS)

  24. References • Recent overviews • Bodenreider O. (2004). The Unified Medical Language System (UMLS): Integrating biomedical terminology. Nucleic Acids Research; D267-D270. • Bodenreider O. From terminology integration to information integration: Unified Medical Language System (UMLS).BioRDF Teleconference, W3C Semantic Web Health Care and Life Sciences Interest Group, June 5, 2006.http://mor.nlm.nih.gov/pubs/pres/060605-BioRDF.pdf

  25. MedicalOntologyResearch Olivier Bodenreider Lister Hill National Centerfor Biomedical CommunicationsBethesda, Maryland - USA Contact: Web: olivier@nlm.nih.gov mor.nlm.nih.gov

More Related