570 likes | 584 Views
The UMLS Semantic Network Support for semantic integration and reasoning. Workshop UMLS Semantic Network NLM, NIH, Bethesda, 7-8 Apr 2005 Anita Burgun. Overview. Semantic integration Role of the SN Integration of resources Integration of data Reasoning Reasoning with hierarchies
E N D
The UMLS Semantic NetworkSupport for semantic integration and reasoning Workshop UMLS Semantic Network NLM, NIH, Bethesda, 7-8 Apr 2005 Anita Burgun
Overview • Semantic integration • Role of the SN • Integration of resources • Integration of data • Reasoning • Reasoning with hierarchies • Reasoning with associative relations • Perspectives • Illustration • Genes, gene products, diseases • Findings, signs, diseases
Semantic integration 1- Role of ontologies
Mediation system Ontologies Integration DWH Gene instances Data Warehouse Local res. External resources Micro-arraydata Patient files ….. GEN BANK SWISS PROT MED LINE GOA
Integrating data in the domain of organ failure and transplantation EfG transplantation REIN End stage renal failure EfG terminology server dialysis Local Information Systems
M A P P I N G ONTO- TERM T1 T2 mapping term-term T3 Metathesaurus Semantic Network
Semantic integration 2- Resource Integration
Mediation system Ontologies Integration DWH Gene instances Data Warehouse Local res. External resources Micro-arraydata Patient files ….. GEN BANK SWISS PROT MED LINE GOA
<OrgName> <OrgName_name> <OrgName_name_binomial> <BinomialOrgName> <BinomialOrgName_genus>Homo</BinomialOrgName_genus> <BinomialOrgName_species>sapiens</BinomialOrgName_species> </BinomialOrgName> </OrgName_name_binomial> </OrgName_name> <OrgName_lineage>Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo</OrgName_lineage> <OrgName_gcode>1</OrgName_gcode> <OrgName_mgcode>2</OrgName_mgcode> <OrgName_div>PRI</OrgName_div> </OrgName> <ORGANISM>Homo sapiens</ORGANISM> <TAXONOMY>Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.</TAXONOMY> Introduction <OrgName> <OrgName_name> <OrgName_name_binomial> <BinomialOrgName> <BinomialOrgName_genus>Homo</BinomialOrgName_genus> <BinomialOrgName_species>sapiens</BinomialOrgName_species> </BinomialOrgName> </OrgName_name_binomial> </OrgName_name> <OrgName_lineage>Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo</OrgName_lineage> <OrgName_gcode>1</OrgName_gcode> <OrgName_mgcode>2</OrgName_mgcode> <OrgName_div>PRI</OrgName_div> </OrgName> • Increasing need for physicians and biologists to access information on the Internet • Biomedical sources • Scattered • Multiple heterogeneity • Rapid evolution and frequent creation • Integration
Objectives • Overall: creating a system • Global access • Homogeneous and up-to-date information • Specific: acquiring sources schemas • As automatically as possible • Dealing with updates, adding new resources • Generate different paths to access information
Sources schema • Rarely available or hard to exploit • No existing standard • Identifying the schema of each source by exploiting its contents • Informs on the type of information present in the source • Extraction from its Web site
Use of UMLS • Heterogeneity of schemas • Need of a common vocabulary: the UMLS • Example : finding the site of expression of a gene starting from a gene symbol
Results • 279 distinct terms extracted from 11 sources • 232 found in the UMLS corresponding to 495 MTH concepts • 318 were correct • 177were not • 47 not found • Of the 318 MTH concepts, 60 concepts are common to at least 2 distinct extracted terms (158 are specific)
Mapping ULT to the UMLS • General concepts • Citation -> Organism attribute • Description -> Research activity • Symbol -> Idea or Concept • Name -> Intellectual product • History -> Finding • Matches -> Manufactured object • Link -> Chemical Viewed Structurally • Association -> Mental Process/ Social Behavior
General concepts General classes « Metaterms » WordNet?? Upper Level Ontology General Ontology Idea or Concept Intellectual Product Attributes Functional/Spatial/ Temporal Concept Domain Ontology
Semantic integration 3- Data Integration
Functional genomics • Post genomics • Gene expression, protein function, biological process, disease • van de Vijver MJ et al. A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med. 2002 Dec 19; 347(25): 1999-2009. • Objective : provide « medical » annotation of genes (BioMeKe) • GeneTraces (Cantor, Lussier)
Gene, gene product, disease • HUGO : manage heterogeneity of data • Superoxide dismutase 1, soluble/ amyotrophic lateral sclerosis 1 (adult) • C1420306 SOD1 gene (symbol) gene or Genome • C0669516 SOD1 gene product (symbol) Amino acid, protein • C0002736 ALS (previous symbol) Disease or Syndrome • No relation in MTH
Gene, gene product, disease • HUGO • Aconitase 1, soluble • C1412126 ACO1 gene (symbol) gene or Genome • C0378502 ACO1 protein (symbol)/ IRP 1 protein (alias) • Amino acid, protein • OR relation between the two concepts in MTH
Gene, gene product, disease • HUGO synonymous terms T1 T2 T3 ST2 C2 ST3 ST1 C1 C3
Gene, gene product, disease Gene or Genome produces location_of AA, protein Disease or Syndrome affects causes
Reasoning SN relations categorization
Reasoning : relations 1- The hierarchy and the economy principle
The economy principle • R1. Ad hoc precision • The intent is to establish a set of semantic types, which will be useful for a variety of tasks without introducing undue complexity. The most specific semantic type in the semantic type hierarchy is assigned to the concept. • R2. No hybrid types • Instead of creating a lattice structure, with hybrid types inheriting from two supertypes, the SN has a single inheritance tree structure. As a consequence, a Metathesaurus concept inheriting from two STs is assigned to both types. • R3. No category “other” • Rather than proliferating the number of semantic types to encompass multiple additional subcategories, concepts that cannot be categorized by any sibling Semantic Type are simply assigned their common supertype.
The economy principle and the theory • Intensions and extensions • Taxonomies (isa) are systems in which categories (intensions) are related to one another by means of subordination, or, in class parlance (extensions), systems in which classes are related to one another by means of class inclusion. • Categories and classes • When a category K has subcategories K1, K2, …. Kn, its extension, the class CK is the union of the classes for each of its subcategories, i.e. CK1, CK2,……CKn.
Categories Manufactured Object physical object made by human beings Medical Device Research Device Clinical Drug Manufactured object used primarily in the diagnosis, treatment, or prevention of physiologic or anatomic disorders Manufactured object used primarily in carrying out scientific research or experimentation Pharmaceutical preparation as produced by the manufacturer CMD CRD CCD CMD CRD CCD CMO 45 inch calibre bullet magnetic tape, matches, corridor Classes
Reasoning : relations 2- Associative relations
Diseases and Findings Event Entity Natural phenomenon or process Conceptual entity Finding Pathologic function Sign or Symptom Disease or syndrome
Associated_with Evaluation_of Manifestation_of Diseases and Findings: SN Finding is_a Sign or Symptom Disease or syndrome Diagnoses
Relations SN • Disease or Syndrome affects Disease or Syndrome • Disease or Syndrome associated_with Disease or Syndrome • Disease or Syndrome co-occurs_with Disease or Syndrome • Disease or Syndrome complicates Disease or Syndrome • Disease or Syndrome degree_of Disease or Syndrome • Disease or Syndrome manifestation_of Disease or Syndrome • Disease or Syndrome occurs_in Disease or Syndrome • Disease or Syndrome precedes Disease or Syndrome • Disease or Syndrome process_of Disease or Syndrome • Disease or Syndrome result_of Disease or Syndrome
Relations in SNOMED CT vs SN • Class ASNCT = SNCT concepts assigned to the Semantic Type A • Class DISEASESSNCT = SNCT concepts assigned to ‘Diseases or Syndrome’ A B C MTH restricted to SNCT
Relations in SNOMED CT • MTH restricted to SNOMED CT • Relations whose SAB = SNOMED CT • 2,220,144 relations • 1,392,380 associative relations (including inverse relations) • 113 associative relationships (all have inverse except associated_with) • 18 relationships have less than 100 instances • Has_time_aspect_of : 1 • Has_property : 77 • The most frequent : • Has_onset : 114,173 • has_finding_site : 99,156 • has_method : 70,682
Relations in SNOMED CT • Focus on Diseases and Findings • Class DISEASESSNCT = SNCT concepts assigned to ‘Disease or Syndrome’ • Class FINDINGSSNCT = SNCT concepts assigned to {‘Finding’ + ‘Sign or Symptom’} Finding Disease or Sd Sign or symptom MTH restricted to SNCT
due_to definitional_manifestation_of associated_with occurs_before mapped_to has_finding_site has_associated_finding interprets has_associated_morphology Diseases-Diseases relations SNCT
due_to definitional_manifestation_of associated_with occurs_before mapped_to has_finding_site has_associated_finding interprets has_associated_morphology result_of manifestation_of associated_with precedes , occurs_in, complicates? co-occurs_with degree_of process_of affects Diseases-Diseases relations SNCT/SN SNCT SN
has_associated_finding / associated_finding_of has definitional manifestation/ definitional_manifestation_of interprets / is_interpreted_by/ has_interpretation occurs_after / occurs_before mapped_to /mapped from has_associated_morphology / associated_morphology_of due_to / cause_of focus_of has_finding_site isa / inverse is-a Findings-Diseases relations SNCT
has_associated_finding / associated_finding_of has definitional manifestation/ definitional_manifestation_of interprets / is_interpreted_by/ has_interpretation occurs_after / occurs_before mapped_to /mapped from has_associated_morphology / associated_morphology_of due_to / cause_of focus_of has_finding_site isa / inverse is-a associated_with manifestation_of diagnoses evaluation_of Diseases-Findings relations SNCT/SN SNCT SN
Associated_with Evaluation_of Manifestation_of Diseases and Findings Finding is_a Is_a 5,592 instances Sign or Symptom Disease or syndrome Diagnoses
Diseases and Findings Event Entity Natural phenomenon or process Conceptual entity Finding Pathologic function Sign or Symptom Disease or syndrome
Diseases and Findings Finding Disease or syndrome Sign or Symptom C1300028 Disorder characterizedby pain is_a C0000727 Abdomen, acute
Diseases and Findings C0008767 Scar Finding has_finding_site C1300028 Endometriosisin scar of skin Sign or Symptom Disease or syndrome
Diseases and Findings Event Entity Natural phenomenon or process Conceptual entity Finding Pathologic function Sign or Symptom Disease or syndrome
Formal properties • Guarino, Welty • Rigidity • property that is essential to all the instances.Person (+R). Physician (not R). • Identity • there is a property that is both necessary and sufficient for identifying an instance. Person (+I) • Unity • instances are intrinsic wholes. Person (+U). • Dependence • for all the instances x, necessarily some instance of Z must exist, which is not a part of x, nor a constituent of x (+D). Food (+D)
Formal properties Rules • Rules • (not U) cannot subsume (+U)e.g., Substance cannot subsume Physical Object • […] • Distinction between roles and sortal types • Roles: (Not Rigid) (+Dependent) • Sortal types : (+Rigid) (Not Dependent)
Formal properties: signs • Signs or Symptoms are Roles • Metathesaurus concepts that are assigned only to roles with no sortal Semantic Type represent a numerous set of entities • About 90% of the MTH concepts assigned to Findings, and Signs or Symptoms are not assigned to another Semantic Type.