240 likes | 402 Views
Computational Intelligence in Biomedical and Health Care Informatics HCA 590 (Topics in Health Sciences). Rohit Kate. Knowledge Representation: Description Logics. Reading. An Introduction to Description Logics By D. Nardi and R. J. Brachman
E N D
Computational Intelligence in Biomedical and Health Care InformaticsHCA 590 (Topics in Health Sciences) Rohit Kate Knowledge Representation: Description Logics
Reading • An Introduction to Description Logics By D. Nardi and R. J. Brachman Chapter 1 in F. Baader et al. (Eds.), Description logic handbook. Cambridge: Cambridge University Press. 2002. (Skip Sections 1.6 & 1.7) Additional Reading (for more formal definitions): By F. Baader and W. Nutt Section 2.2.1 of Chapter 2 in the above book.
What is Knowledge Representation (KR)? • Intelligence is impossible without knowledge • Computational intelligence is impossible without knowledge encoded in computer processable form • Knowledge Representation is representing knowledge of a domain, e.g. medical domain, in a computer processable form to enable computational intelligence • Automated reasoning • Automated discovery
Motivating Example[From Trotter & Uhlman, 2011] • Query: “Find records of all patients infected with Gram Positive Cocci” • Suppose they are at increased risk of developing kidney infection if they have been treated with a certain class of antibiotic • If a patient record has “Penumococcal Pneumonia” as the disease name then a keyword-based search will not find this patient • But if the disease name is recorded or converted into a SNOMED CT expression then a “subsumption” search will find the patient record
Motivating Example [From Trotter & Uhlman, 2011] Gram Positive Coccus Is_a Inference Streptococcus Pneumococcal infectious disease Causative agent Is_a Pneumococcal Pneumonia Finding site Associated Morphology Lung structure Inflammation SNOMED CT has been developed in the description logic formalism [Baader et al. 2003] and hence is highly amenable for automated reasoning.
What are different formalisms of KR? • Propositional Logic (Propositional Calculus) • First-order Logic (Predicate Logic, Predicate Calculus) • Description Logics • Semantic Networks and Ontologies • Often based on Description Logics
Description Logics (DL) • First-order logic is often more powerful and more expressive framework than needed in many domains • If one only needs to encode categories, objects and relations in a domain then something simpler will suffice and will be more efficient • Description logics (DL) is one such very widely used framework, especially in coding medical knowledge • “Describes” things in a domain • More expressive than propositional logic, but less expressive than first-order logic • Some things are easily expressible in description logics but are difficult or awkward to express in first-order logic • All the patients who visited clinic at least twice but not more than five times
Concepts • Description Logics work in terms of concepts and roles • Concepts: Represent classes i.e. set of entities, for example, Disease, Person, Female, Male, Mother, etc. • An instance of Disease will be COLD • Represented as Disease(COLD) • It means COLD is a Disease
Concept Constructors • Complex concepts can be built from simpler concepts using concept constructors • Various description languages differ according to the concept constructors they provide • Hence plural is used for description logics • Presence and absence of concept constructors affect the computational complexity of reasoning in a description language
Concept Constructors • Intersection (П) represents intersection of concepts, for e.g. Person П Female • An instance will be Female П Person (ANNA) • ANNA is a Female and a Person • Union (U) construction represents union of concepts, for e.g. Female U Male • Negation () construction represents all those individuals that are not in that concept class, for e.g. Female
Atomic Concepts • Complex concepts can be given a name and defined with Ξ symbol • Woman Ξ Person П Female • Atomic concepts: Concepts that cannot be represented using other concepts • Woman is not an atomic concept here
Description Logics • Roles: Represent relations between pairs of instances, for e.g. hasChild • An instance will be hasChild(ANNA, JOCOPO) • ANNA has a child JACOPO • Roles can also be used to represent concepts • For example, hasChild represents all those who have a child
Description Logics • Roles can be used with quantification to represent concepts • Existential () • hasChild.Female represents all those who have a female child • For all () • hasChild.Female represents individuals whose all children are female • Value restriction • (>= 3 hasChild) П (<= 2 hasFemaleRelative) represents all individuals who have at least three children and at most two female relatives • (>=2 clinicVisits) П (<= 5 clinicVisits) represents all patients who visited clinic at least twice but not more than five times
An Example Person >= 1 hasChild Parent Female Woman Woman Person П Female Parent Person П >= 1 hasChild Mother Female П Parent Mother Is every Mother a Woman? Not explicit, but is implicit here. Reasoning: Every Mother is Female and Parent. Since every Parent is a Person, hence every Mother is a Female and Person which is the definition of a Woman. A knowledge representation system should be able to determine such relations automatically. Could be a complex task in some domains.
Reasoning in DL • The basic inference on concept expressions in DL is subsumption, written as C D, meaning everyone in C is also in D • Basic query in DL is whether a concept C is subsumed by another concept D, for e.g. is every Mother a Woman • In DL, subsumption is: • Sound (there is an algorithm which when returns “yes” then subsumption is true) • Complete (there is an algorithm which when returns “no” then subsumption is not true ) • Efficient (the above algorithms run very fast) • Because of above theoretical results, DL is very widely used in practice • There are several versions of DL which usually differ in what operators and quantifications are defined over concepts and roles, accordingly their efficiency of reasoning vary
A Well Known Trade-off in Knowledge Representation • There is a trade-off between expressiveness of a representation language and the difficulty of reasoning over it [Brachman and Levesque, 1984] • The more expressive a language is, computationally more difficult the reasoning is • Description logics languages are good compromise and its expressiveness often suffices many applications • Within different description logics languages, the same trade-off holds
DL Knowledge Base • A DL Knowledge base consists of two parts: TBox and ABox • TBox (terminology box): Describes general properties of concepts, for e.g. Person, Female, Woman etc. • ABox (assertion box): Specifies individuals of the domain, for e.g. : • Female П Person (ANNA) • ANNA is a Female and a Person • hasChild(ANNA,JACOPO) • ANNA has a child JACOPO
Basic Reasoning in DL • TBox • Subsumtion • Whether a concept subsumes another • Classification • Where to put a new concept in the hierarchy of concepts • Determine using subsumtion: place between the most specific concept that subsumes it and the most general concept that it subsumes • There are algorithms to do the above automatically
Basic Reasoning in DL • ABox • Instance checking • Whether an individual is an instance of a concept • Knowledge base consistency • Whether every concept admits at least one individual • Realization • Find the most specific concept an individual object is an instance of • Retrieval • Find the individuals that are instances of a concept The last three can be accomplished through instance checking. • There are algorithms to do all the above automatically
DL in Medicine • Many large-scale knowledge bases (hundereds of thousands of concepts) are common in medicine • GALEN (Generalized Architecture for Languages, Encyclopedias, and Nomenclatures in Medicine) [Rector et al. 1993] is a terminology resource for clinical systems built using a Description Logic • SNOMED CT (Systematic Nomenclature of Medicine Clinical Terms) is a comprehensive biomedical terminology also developed in a Description Logic
GALEN • GALEN: Generalized Architecture for Languages, Encyclopedias and Medicine, to represent “all and only sensible medical concepts” • Developed as a Eurpoean Union project (1992-99) • Uses a specialized description logic language (GRAIL: GALEN Representation and Integration Language), also available in OWL (Web Ontology Language) • OpenGALEN, publicly accessible version has about 25,000 concepts
SNOMED CT • SNOMED: Systematized Nomenclature of Medicine (SNOMED CT: Clinical Terms) • Developed by College of American Pathologists • Most comprehensive biomedical ontology • Contains about 269,864 classes and 407,510 names • Available for free as part of UMLS • Uses description logic formalism
SNOMED CT • A concept is described in terms of roles and other concepts Viral infections of the central nervous system Infective meningitis Is_a Is_a Viral meningitis, Abacterial meningitis, Aseptic meningitis, viral (Unique ID: 58170007) Role Causative agent Course Courses Virus Episodicities Assosiative morphology Episodicities Finding site Inflammation Onset Severity Meninges structure Severities Sudden onset, Gradual onset
Conclusions • Knowledge representation formalisms enable automated reasoning and answering queries • Once properly represented, reasoning over the knowledge can be done through symbol manipulation hence it can be automated using a computer • Different formalisms have varying expressive power and computational complexity • Knowledge in any domain of biomedicine is typically large (huge terminology) and complex and its systematic organization and automated reasoning are indispensable