1 / 33

Frank Hartel, PhD Enterprise Vocabulary Services National Cancer Institute

NCI Enterprise Vocabulary Services (EVS) and Semantic Integration at NCI. - An Overview -. Frank Hartel, PhD Enterprise Vocabulary Services National Cancer Institute. Outline:. Terminology management and semantic integration at NCI NCI Enterprise Vocabulary Services

Download Presentation

Frank Hartel, PhD Enterprise Vocabulary Services National Cancer Institute

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NCI Enterprise Vocabulary Services (EVS) and Semantic Integration at NCI - An Overview - Frank Hartel, PhD Enterprise Vocabulary Services National Cancer Institute

  2. Outline: • Terminology management and semantic integration at NCI • NCI Enterprise Vocabulary Services • NCI Thesaurus (NCIt) • NCI Metathesaurus (NCI Meta) • Collaborations

  3. NCI biomedical informatics • Goal: A virtual web of interconnected data, individuals, and organizations redefines how research is conducted, care is provided, and patients/participants interact with the biomedical research enterprise

  4. Interoperability Courtesy: Charlie Mead • in·ter·op·er·a·bil·i·ty • ability of a system...to use the parts or equipment of another systemSource: Merriam-Webster web site • interoperability • ability of two or more systems or components to exchange information and to use the information that has been exchanged. Source: IEEE Standard Computer Dictionary: A Compilation of IEEE Standard Computer Glossaries, IEEE, 1990] Syntacticinteroperability Semanticinteroperability

  5. No Controlled Terminology?No Interoperability • Systems cannot exchange or use information if they use incompatible codes or tokens to signify meaning • Terminology services provide token and codes • Proper use of them assures consistent meaning across the enterprise

  6. Public APIs Domain object metadata Common data elements Common data elements (CDEs) Can it be done?caCORE - An Example Dictionary, thesaurus, ontology services via caBIO API Vocabulary for CDE specification via downloads

  7. cancer Common Ontologic Representation Environment (caCORE) • Information integration • Cross-discipline reasoning biomedical objects common data elements controlled vocabulary

  8. Common Data Elements • Structured data reporting elements • Precisely defining the questions and answers • What question are you asking, exactly? • What are the possible answers, and what do they mean? biomedical objects common data elements controlled vocabulary

  9. Biomedical Information Objects • Data service infrastructure developed using OMG’s Model Driven Architecture approach • Object models expressed in UML represent actual biomedical research entities such as genes, sequences, chromosomes, sequences, cellular pathways, ontologies, clinical protocols, etc. • The object models form the basis for uniform APIs (Java, SOAP, HTTP-XML, Perl) that provide an abstraction layer and interfaces for developers to access information without worrying about the back-end data stores biomedical objects common data elements controlled vocabulary

  10. Binding Data, Metadata to Terminology - caCORE SDK • UML Modeling Tool (provided by user) • Information model that will define data classes, attributes and relationships • Semantic Connector • Annotate UML model with ontology concepts: bridges the world of databases to that of structured semantics. • UML Loader (run by NCI staff) • Loads model into the caDSR metadata registry • Model and associated semantics are available at runtime • Code Generator • Model and a code template are inputs into generator • Creates the ‘caCORE-like’ n-tier software system with Java and Web Services APIs

  11. caCORE SDK

  12. Extending Interoperability Beyond the Enterprise • cancer Biomedical Informatics Grid (caBIG) • Common, widely distributed infrastructure permits cancer research community to focus on innovation • Shared vocabulary, data elements, data models facilitate information exchange • Collection of interoperable applications developed to common standard • Raw cancer research data is available for mining and integration

  13. caBIG - facilitate sharing of infrastructure, applications, and data

  14. OTHER TOOLKITS NCI OTHER caBIG SERVICE PROVIDERS Cancer Center Cancer Center caGrid Cancer Center Cancer Center Cancer Center

  15. Functions Quality of Service Semantic Service ID Resolution caCORE Globus DQP Business Process Workflow Security Resource Management Service Registry Service OGSA-DAI GRAM Globus myProxy Service Description Globus Toolkit Grid Communication Protocol GSI Transport CAS Mobius Globus OGSA Compliant - Service Oriented Architecture caGrid Service-Oriented Architecture

  16. Enterprise Vocabulary • NCI Metathesaurus (Cross-map standard vocabularies/ontologies, e.g. SNOMED, MedDRA, ICD) • Semantic integration, inter-vocabulary mapping • UMLS Metathesaurus extended with cancer-oriented vocabularies • 930,000 Concepts, 2,200,000 terms and phrases • Mappings among over 50 vocabularies • NCI Thesaurus • Description logic-based • 48,000 “Concepts” • Concept is the semantic unit • Terms are Concept labels – synonymy • Semantic relationships between Concepts • Other standard terminologies • MedDRA, MGED, SNOMED, GO, etc. biomedical objects common data elements controlled vocabulary

  17. NCI builds on EVS via caCORE Infrastructure

  18. Production EVS Serversin caCORE

  19. Enterprise Vocabulary Services • Services and resources that address NCI's needs for controlled vocabulary http://www.nci.nih.gov/EVS • A collaboration • NCI Office of Communications • Physician Data Query (PDQ), Cancer Information Service and the NCI web portal www.cancer.gov • NCI Center for Bioinformatics • Bioinformatics Core Infrastructure (caCORE), including metadata repository (caDSR) and object models built using EVS terminology for core semantics

  20. NCI EVS Goal – Integration by Meaning • Clinical, translational, and basic research terminology have overlapping but specialized needs, therefore EVS assists to: • Integrate different conceptual frameworks • Create terminological and taxonomic conventions across systems • Vocabulary Products • NCI Thesaurus – an ontology-like terminology • NCI Metathesaurus – maps vocabularies • External vocabularies maintained and served: MedDRA, HL7, NDF-RT, LOINC, etc.

  21. TerminologyDevelopment Guidelines • Develop a content model • Leverage existing sources where appropriate • (VA NDF-RT, RxNorm, LOINC, etc. …) • Develop unique content where needed • (Cancer genes and diagnoses, drugs and therapies, molecular abnormalities, clinical trial standard terminology etc.) • Link to other information sources and standards using URLs as possible • (GO, Swissprot, drug formularies, trial protocols) • Federate, merge or map with other standard terminology for semantic integration

  22. NCI Thesaurus (NCIt) • Reference Terminology for NCI, Partners • A Federal Standard Terminology • Broad coverage of the cancer research and clinical domain including prevention and treatment trials • Neoplastic and other Diseases • Findings and Abnormalities • Anatomy, Tissues, Subcellular Structures • Agents, Drugs, Chemicals • Genes, Gene Products, Biological Processes • Animal Models – Mouse, other • Research techniques and management, apparatus, clinical and lab, radiology, imagery

  23. NCI Thesaurus (2) • Published Monthly • Public domain, open content license • Available on-line and by download (OWL, Ontylog XML, flat files) • 48,000+ “Concepts” hierarchically organized • Description-logic based • “Roles” establish machine readable semantic relationships between Concepts, ex.: “Carcinoma” Clinically_associated_with “Lytic Bone Lesions,” “TP53” Gene_associated_with_Disease “Breast Carcinoma”

  24. NCI Thesaurus is Deployed: http://nciterms.nci.nih.gov http://www.nci.nih.gov/EVS (full documentation) • API: caCORE public access • Fulfills NCI and collaborators’ needs for controlled vocabulary • Public domain, open content license

  25. Example Concept Details Concept Details URI: http://nciterms.nci.nih.gov:80/NCIBrowser/ConceptReport.jsp?dictionary=NCI_Thesaurus&code=C19151 Version: August 2005 (05.09e) Metastasis Identifiers:  name   Metastasis  code   C19151 Relationships to other concepts:  Biological_Process_Has_Result_Biological_Process Tumor Expansion   Biological_Process_Has_Initiator_Process Pathologic Process Information about this concept:  Synonym MET   Synonym metastasis Synonym Tumor Cell Migration Synonym with source data Metastasis|PT|CADSR   Synonym with source data MET|AB|CADSR   Synonym with source data Tumor Cell Migration|SY|NCI Synonym with source data Metastasis|PT|NCI Synonym with source data metastasis|SY|NCI-GLOSS|CDR0000046710 NCI_META_CUI CL001192 Semantic_Type Phenomenon or Process   Related_Lash_Concept metastasis   Preferred_Name Metastasis    DEFINITION NCI|Metastasis is the spread or migration of cancer cells from one part of the body (the organ in which it first appeared) to another. The secondary tumor contains cells that are like those in the original (primary) tumor. For example, breast cancer cells may spread (metastasize) to the lungs and cause the growth of a new tumor. When this happens, the disease is called metastatic breast cancer. (NCI)  Synonym Metastasis   DEFINITION NCI-GLOSS|(meh-TAS-ta-sis) The spread of cancer from one part of the body to another. A tumor formed from cells that have spread is called a secondary tumor, a metastatic tumor, or a metastasis. The secondary tumor contains cells that are like those in the original (primary) tumor. The plural form of metastasis is metastases (meh-TAS-ta-seez).    Superconcepts: Cancer Progression Subconcepts: Distant Metastasis Intravascular Metastasis

  26. Other Examples : Use URI to view Details of a Drug Concept- http://nciterms.nci.nih.gov:80/NCIBrowser/ConceptReport.jsp?dictionary=NCI_Thesaurus&code=C620 Use GUI to search for and view hierarchy Http://nciterms.nci.nih.gov Fluvastatin Sodium

  27. NCI Metathesaurus: • Filtered UMLS Metathesaurus extended with additional required vocabularies • 930,000+ concepts, 2,200,000 terms and phrases with definitions • Mappings among over 50 vocabularies • Extensive synonymy: Over 40,000 terms for neoplasms mapped to 7,000 concepts • Used as online dictionary and thesaurus, for mapping and document indexing

  28. NCI Metathesaurus (2) • Minor releases monthly, Major releases twice a year • Provides a mapped overlap and partial inter-relation of current versions of NCI and partner required vocabularies, ex. The ICD’s, MedDRA, SNOMED, MeSH (NLM Medical Subject Headings), HCPCS (procedures), LOINC (lab values), drug terminologies (VA NDF-RT, AOD, RxNORM, Multum, NCI Thesaurus drugs, etc.)

  29. EVS Products & Services Are Open • NCI Thesaurus is Open Contnentftp://ftp1.nci.nih.gov/pub/cacore/EVS/ThesaurusTermsofUse.htm • NCI Metathesaurus is Mostly Open Source See Each Source’s Licensehttp://ncimeta.nci.nih.gov/MetaServlet/GenerateSourcesServlet • NCI EVS Servers Are Freely Accessible • On the Web: • Via API: • All Software Developed by NCI EVS is Public Open Source and Free for the Asking: http://nciterms.nci.nih.govandhttp://ncimeta.nci.nih.gov http://ncicb.nci.nih.gov/core/caBIO http://ncicb.nci.nih.gov/core

  30. EVS Collaborations • Many Active Collaborations • Federal: FDA, VA, CDC, and Various NIH Institutes such as NHLBI, NIDCR • Major Standards Organizations: HL7, CDISC, W3C, FHA • Cancer Centers and Cancer Cooperative Groups (caBIG, caGRID) • Numerous Research collaborators such as the Microarray Gene Expression Data Society (MGED Ontology, FuGO)

  31. Areas of Collaboration • FDA (Terminology for Drugs, Devices, and Clinical Trial Terminology Initiatives) • VA (Drugs, Common Clinical Trials Semantics, Terminology Operations) • CDC (Cancer Incidence and Prevention, Terminology Operations) • Cancer Centers (Clinical Trials, Experimental Organism Terminology, Micro- nutrients, Open Terminology Servers, other (caBIG)) • CDISC/HL7 RCRIM (Clinical Research Data Standards)

  32. Contact:Frank Hartel, PhDNCI Center for Bioinformaticshartel@mail.nih.gov

More Related