240 likes | 380 Views
The NCI Thesaurus: A Controlled Vocabulary Of NCI Functions. Gilberto Fragoso Center for Bioinformatics National Cancer Institute National Institutes of Health (U.S.). Overview. Center for Bioinformatics (NCICB) Mission Supported NCI Activities http://ncicb.nci.nih.gov
E N D
The NCI Thesaurus: A Controlled Vocabulary Of NCI Functions Gilberto Fragoso Center for Bioinformatics National Cancer Institute National Institutes of Health (U.S.)
Overview • Center for Bioinformatics (NCICB) • Mission • Supported NCI Activities • http://ncicb.nci.nih.gov • NCI Thesaurus • Challenges & Issues
NCI Center for Bioinformatics • Support NCI Bioinformatics/Research Activities • Cancer Genome Annotation Project (CGAP) • Clinical Trials • Cancer Molecular Analysis Project (CMAP) • Molecular Analysis of Cancer (Director's Challenge) • Mouse Models of Human Cancer Consortium (MMHCC) • Cancer Core Infrastructure (caCORE) • Cancer Data Standards Repository (caDSR) • Cancer Bioinformatics Infrastructure Objects (caBIO) • Enterprise Vocabulary Services (EVS)
NCI Center for Bioinformatics caBIO • Model of cancer research domain • Sample caBIO classes • Gene • Protein • Sequence • SNP • Chromosome • Clone • Library • Taxon • Agent • Pathway • Tissue • Organ • Disease • ClinicalTrialProtocol • Applicable to other biomedical domains • Released as part of caCORE version 1 • http://ncicb.nci.nih.gov, follow Infrastructure
NCI Center for Bioinformatics Enterprise Vocabulary Services • Collection of services and resources that address NCI's needs for controlled vocabulary • Main vocabulary products • NCI Thesaurus • NCI Metathesaurus • http://ncimeta.nci.nih.gov • Specialty vocabularies • MMHCC (Mouse Models of Human Cancer Cons.) • CTRM (Core Terminology Reference Model)
NCI Thesaurus • Reference biomedical vocabulary for the NCI • Contains all the codes, keywords, and special purpose vocabulary used in the Institute • Features • NCI-only vocabulary sources • No licensing restrictions • Description logic-based • Consistency checks • Auto-classification • Primary purpose is to support coding and retrieval applications
NCI Thesaurus • Freely available • Online access via browser application and API • API extensions provided by caBIO • Flat file format • Terminology and hierarchies only • FTP site will be advertised in our Web site • DAML+OIL (Future)
Elements of the NCI Thesaurus Concepts Identifiers Name, Code, ID Kind (Group) Defined / Primitive tag Parent Concept Roles Properties
Elements of the NCI Thesaurus Concepts Identifiers Name, Code, ID Kind (Group) Defined / Primitive tag Parent Concept Roles Properties Properties Preferred Name Synonyms CUI (UMLS) Semantic Type Definition External DB ID Omim LocusID GenBank SwissProt Comments Editor’s Notes Design Notes
"Kinds" in the NCI Thesaurus Organism Anatomy Clinical Or Research Activity Chemicals And Drugs Findings And Disorders Cancer Science Occupation Or Discipline Protein Diagnostic And Prognostic Factors Properties Or Attributes Gene NCI Biological Process Equipment Technique
Top Level Concepts in Various Kinds NCI Kind Business Rules Conceptual Entities Funding Patient or Public Education Social Concepts Training and Education Anatomy Kind Anatomic Structures and Systems Anatomic Sites Anatomic Structure Body Part, Organ, or Tissue Body Region Cell Structure Embryonic Structures Extracellular Structure Macromolecular Structure Miscellaneous Anatomy Terms Organ Systems
Top Level Concepts in Various Kinds Chemicals And Drugs Kind Drugs and Chemicals Drugs and Chemicals, Structural Classification Inorganic Chemicals Organic Chemicals Drugs and Chemicals, Functional Classification Chemical Modifiers Drug of Abuse Foods and Food Products Immunologics Industrial Products Pharmacologic Substances Physiology - Regulatory Factors Reagents
Top Level Concepts in Various Kinds Gene Kind Genes Cancer Gene Oncogene Oncogene, G-Protein Oncogene, Growth Factor Oncogene, Transcription Factors Proto-Oncogene Susceptibility / Resistance Gene Tumor Promoter Induced Gene Tumor Supressor Gene Candidate Disease Gene Reporter Gene
Top Level Concepts in Various Kinds Findings And Disorders Kind Diseases, Disorders, and Findings Findings Diseases and Disorders Familial Neoplastic Syndrome Lymphoproliferative Disorder Molecular Disease Neoplasm Neoplasm by Morphology Neoplasm by Site Neoplasm by Special Category Neoplasm by Disease NEC Non-Neoplastic Disease, Syndrome, or Condition Precancerous Condition
Top Level Concepts in Various Kinds Biological Process Kind Biological Processes Cell Processes Intercellular Processes Metabolic Processes Organismal Processes Pathologic Processes Physiologic Processes Population Processes Viral Functions and Activities
"Roles" in the NCI Thesaurus • Semantic relations between concepts are expressed via roles • We define a small number of roles for the various kinds • Support classification • Support specific use cases • Reproducible usage by domain experts • A concept is tagged as "defined" when a specific set of roles is asserted
Roles in the Anatomy Kind Anatomic Structure is Physical Part of Anatomic Structure Has Location Anatomy Exp Organism Anatomic Structure is Physical Part of
Disease Roles Findings And Disorders Disease has Associated Anatomy Disease has Modifier Disease has Associated Cell Type Disease Metastatic to Site Properties Or Attributes Anatomy Exp Org Disease has Associated Exp Org Anatomy
Roles in the Chemicals & Drugs Kind CD has Biochemical Class or Structure CD has Target Anatomy Anatomy Chemicals And Drugs CD is Part of CD CD FDA Approved for Disease CD has Target Protein Findings And Disorders Protein CD has Target Organism CD Plays Role in Biological Process CD has Source Biological Process Organism
Roles in the Gene Kind Gene Gene Found in Organism Gene Associated With Disease Organism Findings And Disorders Gene in Chromosomal Location Anatomy Gene is Biomarker Type Diagnostic And Prognostic Factors Gene Has Function Biological Process
Roles in the Protein Kind Protein is Physical Part Of Protein Plays Role in Biological Process Protein Has Chemical Classification Biological Process Protein Has Biochemical Function Protein Protein Encoded by Gene Protein Has Organism Source Gene Organism Protein is Biomarker Type Protein Has Structural Domain or Motif Diagnostic And Prognostic Factors Protein Has Associated Anatomy Protein Expressed in Tissue Protein Malfunction Associated with Disease Anatomy Findings And Disorders
Issues • Increasing number of non-cancer genes • Hierarchies for multiple organisms • Unified, or one subtype per organism? • Tangled hierarchies
Acknowledgements NCICB Frank Hartel Sherri de Coronado James Oberthaler Gilberto Fragoso Peter Covitz Ken Buetow Office of Communications Larry Wright Margaret Haber Contractors Kevric, Inc Apelon, Inc