330 likes | 884 Views
NCI Enterprise Vocabulary Services (EVS) and Semantic Integration at NCI. - An Overview -. Frank Hartel, PhD Enterprise Vocabulary Services National Cancer Institute. Outline:. Terminology management and semantic integration at NCI NCI Enterprise Vocabulary Services
E N D
NCI Enterprise Vocabulary Services (EVS) and Semantic Integration at NCI - An Overview - Frank Hartel, PhD Enterprise Vocabulary Services National Cancer Institute
Outline: • Terminology management and semantic integration at NCI • NCI Enterprise Vocabulary Services • NCI Thesaurus (NCIt) • NCI Metathesaurus (NCI Meta) • Collaborations
NCI biomedical informatics • Goal: A virtual web of interconnected data, individuals, and organizations redefines how research is conducted, care is provided, and patients/participants interact with the biomedical research enterprise
Interoperability Courtesy: Charlie Mead • in·ter·op·er·a·bil·i·ty • ability of a system...to use the parts or equipment of another systemSource: Merriam-Webster web site • interoperability • ability of two or more systems or components to exchange information and to use the information that has been exchanged. Source: IEEE Standard Computer Dictionary: A Compilation of IEEE Standard Computer Glossaries, IEEE, 1990] Syntacticinteroperability Semanticinteroperability
No Controlled Terminology?No Interoperability • Systems cannot exchange or use information if they use incompatible codes or tokens to signify meaning • Terminology services provide token and codes • Proper use of them assures consistent meaning across the enterprise
Public APIs Domain object metadata Common data elements Common data elements (CDEs) Can it be done?caCORE - An Example Dictionary, thesaurus, ontology services via caBIO API Vocabulary for CDE specification via downloads
cancer Common Ontologic Representation Environment (caCORE) • Information integration • Cross-discipline reasoning biomedical objects common data elements controlled vocabulary
Common Data Elements • Structured data reporting elements • Precisely defining the questions and answers • What question are you asking, exactly? • What are the possible answers, and what do they mean? biomedical objects common data elements controlled vocabulary
Biomedical Information Objects • Data service infrastructure developed using OMG’s Model Driven Architecture approach • Object models expressed in UML represent actual biomedical research entities such as genes, sequences, chromosomes, sequences, cellular pathways, ontologies, clinical protocols, etc. • The object models form the basis for uniform APIs (Java, SOAP, HTTP-XML, Perl) that provide an abstraction layer and interfaces for developers to access information without worrying about the back-end data stores biomedical objects common data elements controlled vocabulary
Binding Data, Metadata to Terminology - caCORE SDK • UML Modeling Tool (provided by user) • Information model that will define data classes, attributes and relationships • Semantic Connector • Annotate UML model with ontology concepts: bridges the world of databases to that of structured semantics. • UML Loader (run by NCI staff) • Loads model into the caDSR metadata registry • Model and associated semantics are available at runtime • Code Generator • Model and a code template are inputs into generator • Creates the ‘caCORE-like’ n-tier software system with Java and Web Services APIs
Extending Interoperability Beyond the Enterprise • cancer Biomedical Informatics Grid (caBIG) • Common, widely distributed infrastructure permits cancer research community to focus on innovation • Shared vocabulary, data elements, data models facilitate information exchange • Collection of interoperable applications developed to common standard • Raw cancer research data is available for mining and integration
caBIG - facilitate sharing of infrastructure, applications, and data
OTHER TOOLKITS NCI OTHER caBIG SERVICE PROVIDERS Cancer Center Cancer Center caGrid Cancer Center Cancer Center Cancer Center
Functions Quality of Service Semantic Service ID Resolution caCORE Globus DQP Business Process Workflow Security Resource Management Service Registry Service OGSA-DAI GRAM Globus myProxy Service Description Globus Toolkit Grid Communication Protocol GSI Transport CAS Mobius Globus OGSA Compliant - Service Oriented Architecture caGrid Service-Oriented Architecture
Enterprise Vocabulary • NCI Metathesaurus (Cross-map standard vocabularies/ontologies, e.g. SNOMED, MedDRA, ICD) • Semantic integration, inter-vocabulary mapping • UMLS Metathesaurus extended with cancer-oriented vocabularies • 930,000 Concepts, 2,200,000 terms and phrases • Mappings among over 50 vocabularies • NCI Thesaurus • Description logic-based • 48,000 “Concepts” • Concept is the semantic unit • Terms are Concept labels – synonymy • Semantic relationships between Concepts • Other standard terminologies • MedDRA, MGED, SNOMED, GO, etc. biomedical objects common data elements controlled vocabulary
Enterprise Vocabulary Services • Services and resources that address NCI's needs for controlled vocabulary http://www.nci.nih.gov/EVS • A collaboration • NCI Office of Communications • Physician Data Query (PDQ), Cancer Information Service and the NCI web portal www.cancer.gov • NCI Center for Bioinformatics • Bioinformatics Core Infrastructure (caCORE), including metadata repository (caDSR) and object models built using EVS terminology for core semantics
NCI EVS Goal – Integration by Meaning • Clinical, translational, and basic research terminology have overlapping but specialized needs, therefore EVS assists to: • Integrate different conceptual frameworks • Create terminological and taxonomic conventions across systems • Vocabulary Products • NCI Thesaurus – an ontology-like terminology • NCI Metathesaurus – maps vocabularies • External vocabularies maintained and served: MedDRA, HL7, NDF-RT, LOINC, etc.
TerminologyDevelopment Guidelines • Develop a content model • Leverage existing sources where appropriate • (VA NDF-RT, RxNorm, LOINC, etc. …) • Develop unique content where needed • (Cancer genes and diagnoses, drugs and therapies, molecular abnormalities, clinical trial standard terminology etc.) • Link to other information sources and standards using URLs as possible • (GO, Swissprot, drug formularies, trial protocols) • Federate, merge or map with other standard terminology for semantic integration
NCI Thesaurus (NCIt) • Reference Terminology for NCI, Partners • A Federal Standard Terminology • Broad coverage of the cancer research and clinical domain including prevention and treatment trials • Neoplastic and other Diseases • Findings and Abnormalities • Anatomy, Tissues, Subcellular Structures • Agents, Drugs, Chemicals • Genes, Gene Products, Biological Processes • Animal Models – Mouse, other • Research techniques and management, apparatus, clinical and lab, radiology, imagery
NCI Thesaurus (2) • Published Monthly • Public domain, open content license • Available on-line and by download (OWL, Ontylog XML, flat files) • 48,000+ “Concepts” hierarchically organized • Description-logic based • “Roles” establish machine readable semantic relationships between Concepts, ex.: “Carcinoma” Clinically_associated_with “Lytic Bone Lesions,” “TP53” Gene_associated_with_Disease “Breast Carcinoma”
NCI Thesaurus is Deployed: http://nciterms.nci.nih.gov http://www.nci.nih.gov/EVS (full documentation) • API: caCORE public access • Fulfills NCI and collaborators’ needs for controlled vocabulary • Public domain, open content license
Example Concept Details Concept Details URI: http://nciterms.nci.nih.gov:80/NCIBrowser/ConceptReport.jsp?dictionary=NCI_Thesaurus&code=C19151 Version: August 2005 (05.09e) Metastasis Identifiers: name Metastasis code C19151 Relationships to other concepts: Biological_Process_Has_Result_Biological_Process Tumor Expansion Biological_Process_Has_Initiator_Process Pathologic Process Information about this concept: Synonym MET Synonym metastasis Synonym Tumor Cell Migration Synonym with source data Metastasis|PT|CADSR Synonym with source data MET|AB|CADSR Synonym with source data Tumor Cell Migration|SY|NCI Synonym with source data Metastasis|PT|NCI Synonym with source data metastasis|SY|NCI-GLOSS|CDR0000046710 NCI_META_CUI CL001192 Semantic_Type Phenomenon or Process Related_Lash_Concept metastasis Preferred_Name Metastasis DEFINITION NCI|Metastasis is the spread or migration of cancer cells from one part of the body (the organ in which it first appeared) to another. The secondary tumor contains cells that are like those in the original (primary) tumor. For example, breast cancer cells may spread (metastasize) to the lungs and cause the growth of a new tumor. When this happens, the disease is called metastatic breast cancer. (NCI) Synonym Metastasis DEFINITION NCI-GLOSS|(meh-TAS-ta-sis) The spread of cancer from one part of the body to another. A tumor formed from cells that have spread is called a secondary tumor, a metastatic tumor, or a metastasis. The secondary tumor contains cells that are like those in the original (primary) tumor. The plural form of metastasis is metastases (meh-TAS-ta-seez). Superconcepts: Cancer Progression Subconcepts: Distant Metastasis Intravascular Metastasis
Other Examples : Use URI to view Details of a Drug Concept- http://nciterms.nci.nih.gov:80/NCIBrowser/ConceptReport.jsp?dictionary=NCI_Thesaurus&code=C620 Use GUI to search for and view hierarchy Http://nciterms.nci.nih.gov Fluvastatin Sodium
NCI Metathesaurus: • Filtered UMLS Metathesaurus extended with additional required vocabularies • 930,000+ concepts, 2,200,000 terms and phrases with definitions • Mappings among over 50 vocabularies • Extensive synonymy: Over 40,000 terms for neoplasms mapped to 7,000 concepts • Used as online dictionary and thesaurus, for mapping and document indexing
NCI Metathesaurus (2) • Minor releases monthly, Major releases twice a year • Provides a mapped overlap and partial inter-relation of current versions of NCI and partner required vocabularies, ex. The ICD’s, MedDRA, SNOMED, MeSH (NLM Medical Subject Headings), HCPCS (procedures), LOINC (lab values), drug terminologies (VA NDF-RT, AOD, RxNORM, Multum, NCI Thesaurus drugs, etc.)
EVS Products & Services Are Open • NCI Thesaurus is Open Contnentftp://ftp1.nci.nih.gov/pub/cacore/EVS/ThesaurusTermsofUse.htm • NCI Metathesaurus is Mostly Open Source See Each Source’s Licensehttp://ncimeta.nci.nih.gov/MetaServlet/GenerateSourcesServlet • NCI EVS Servers Are Freely Accessible • On the Web: • Via API: • All Software Developed by NCI EVS is Public Open Source and Free for the Asking: http://nciterms.nci.nih.govandhttp://ncimeta.nci.nih.gov http://ncicb.nci.nih.gov/core/caBIO http://ncicb.nci.nih.gov/core
EVS Collaborations • Many Active Collaborations • Federal: FDA, VA, CDC, and Various NIH Institutes such as NHLBI, NIDCR • Major Standards Organizations: HL7, CDISC, W3C, FHA • Cancer Centers and Cancer Cooperative Groups (caBIG, caGRID) • Numerous Research collaborators such as the Microarray Gene Expression Data Society (MGED Ontology, FuGO)
Areas of Collaboration • FDA (Terminology for Drugs, Devices, and Clinical Trial Terminology Initiatives) • VA (Drugs, Common Clinical Trials Semantics, Terminology Operations) • CDC (Cancer Incidence and Prevention, Terminology Operations) • Cancer Centers (Clinical Trials, Experimental Organism Terminology, Micro- nutrients, Open Terminology Servers, other (caBIG)) • CDISC/HL7 RCRIM (Clinical Research Data Standards)
Contact:Frank Hartel, PhDNCI Center for Bioinformaticshartel@mail.nih.gov