580 likes | 739 Views
Understanding the Message: Linking Aristotelian Realism to Linguistic Functionalism. W. Ceusters *, B. Smith **, M. Dos Santos *, J. Simon **, M. O’Donnell *, M. Fielding *,*** * Language & Computing nv, Zonnegem, Belgium
E N D
Understanding the Message:Linking Aristotelian Realism to Linguistic Functionalism W. Ceusters *, B. Smith **, M. Dos Santos *, J. Simon **, M. O’Donnell *, M. Fielding *,*** * Language & Computing nv, Zonnegem, Belgium ** Institute for Formal Ontology and Medical Information Science (IFOMIS), Leipzig, Germany *** Catholic University of Leuven, Leuven, Belgium Werner Ceusters CTO www.landc.be
Presentation overview • Problem description: patient eligibility for clinical trial • Meaning theories • Required technology for natural language understanding • Implementation of a realist ontology for medical natural language understanding • Conclusions • If enough time: a guided tour of LinkFactory Werner Ceusters CTO www.landc.be
The Medical Informatics Dogma Everything should be structured • Fact: computers can only deal with structuredrepresentations of reality: • structured data: • relational databases, spreadsheets • structured information: • XML simulates context • structured knowledge: • rule-based knowledge systems • Typical conclusion (Dogma?): • there is a need for structured data, hence … • … there is a need for structured data entry Werner Ceusters CTO www.landc.be
Structured data entry • Current technical solutions: • rigid data entry forms • coding and classification systems • But: • the description of biological variability requires the flexibility of natural language and it is generally desirable not to interfere with the traditional manner of medical recording (Wiederhold, 1980) • Initiatives to facilitate the entry of narrative data have focused on the control rather than the ease of data entry (Tanghe, 1997) Werner Ceusters CTO www.landc.be
Drawbacks of structured data entry • Loss of information • qualitatively • limited expressiveness and inherent defects of coding and classification systems, controlled vocabularies, and “traditional” medical terminologies • use of purpose oriented systems • don’t use data for another purpose than originally foreseen (J VDL) • quantitatively • too time-consuming to code all information manually • Speech recognition and forms for structured data entry are not best friends Werner Ceusters CTO www.landc.be
Areas for application of medical natural language understanding • Coding patient data • Structured information extraction from unstructured clinical notes • Clinical protocols and guidelines • Assessing patient eligibility for clinical trial entry • Triggering and alerts • Linking case descriptions to scientific literature • Easy access to content • ... towards a medical semantic web Werner Ceusters CTO www.landc.be
Clinical history description • Mr. Kovács is an 83-year-old man with a past medical history of hypertension, congestive heart failure, atrial fibrillation, hypercholesterolemia, and ahistory of CVA who presented himself to Budapest Emergency Room on April 25 with primary complaint of right-sided chest pain since April 24. The patient was in his usual state of health until April 24 when he experienced right-sided chest pain after 10 minutes of bicycling exercise at the YMCA. He described the chest pain as a dull ache in the right side of his chest radiating posteriorly to the right scapular area. He rated the intensity as 7 out of 10. The chest pain lasted about 3 minutes and resolved with rest. That same night, the patient once again experienced right-sided chest pain while lying in bed just before he went to sleep. He describes the pain as right-sided chest pain with same radiation to posterior at an intensity of 6-7 out of 10. The chest pain lasted about 10 minutes and resolved spontaneously. Werner Ceusters CTO www.landc.be
Inclusion criteria of the INVEST study • 1. Male or female • 2. Age 50 to no upper limit • 3. a) Hypertension documented as according to the 6th report of the Joint National Committee on Detection and Evaluation of the treatment of high BP (JNC VI) , b) and the need for drug therapy (previously documented hypertension in patients currently taking antihypertensive agents is acceptable) • 4. Documented CAD (e.g., classic angina pectoris; stable angina pectoris; Heberden angina pectoris), myocardial infarction three or more months ago, abnormal coronary angiography, or concordant abnormalities on two different types of stress tests • 5. Willingness to sign informed consent Werner Ceusters CTO www.landc.be
Do they match ? • Mr. Kovács is … an 83-year-old man with past medical history of hypertension, congestive heart failure, atrial fibrillation, hypercholesterolemia, history of CVA who presented to Budapest Emergency Room on April 25 with chief complaint of right-sided chest pain since April 24. The patient was in his usual state of health until April 24 when he experienced right-sided chest pain after 10 minutes of bicycling exercise at YMCA. He described the chest pain as a dull ache in the right side of his chest radiating posteriorly to the right scapular area. He rated the intensity as 7 out of 10. The chest pain lasted about 3 minutes and resolved with rest. That same night, the patient once again experienced right-sided chest pain while lying in bed right before he went to sleep. He describes the pain as right-sided chest pain with same radiation to posterior at an intensity of 6-7 out of 10. The chest pain lasted about 10 minutes and resolved spontaneously. • 1. Male or female • 2. Age 50 to no upper limit • 3. Hypertension documented according to the 6th report of the Joint National Committee on Detection and Evaluation of the treatment of high BP (JNC VI) and the need for drug therapy (previously documented hypertension in patients currently taking antihypertensive agents is acceptable) • 4. Documented CAD (e.g., classic angina pectoris (stable angina pectoris; Heberden angina pectoris), myocardial infarction three or more months ago, abnormal coronary angiography, or concordant abnormalities on two different types of stress tests) • 5. Willingness to sign informed consent Werner Ceusters CTO www.landc.be
If the computer is to make this deduction ... • 1. Male or female • 2. Age 50 to no upper limit • 3. Hypertension documented according to the 6th report of the Joint National Committee on Detection and Evaluation of the treatment of high BP (JNC VI) and the need for drug therapy (previously documented hypertension in patients currently taking antihypertensive agents is acceptable) • 4. Documented CAD (e.g., classic angina pectoris (stable angina pectoris; Heberden angina pectoris), myocardial infarction three or more months ago, abnormal coronary angiography, or concordant abnormalities on two different types of stress tests) • 5. Willingness to sign informed consent • Mr. Kovács is …an 83-year-old man with past medical history of hypertension, congestive heart failure, atrial fibrillation, hypercholesterolemia, history of CVA who presented to Budapest Emergency Room on April 25 with chief complaint of right-sided chest pain since April 24. The patient was in his usual state of health until April 24 when he experienced right-sided chest pain after 10 minutes of bicycling exercise at YMCA. He described the chest pain as a dull ache in the right side of his chest radiating posteriorly to the right scapular area. He rated the intensity as 7 out of 10. The chest pain lasted about 3 minutes and resolved with rest. That same night, the patient once again experienced right-sided chest pain while lying in bed right before he went to sleep. He describes the pain as right-sided chest pain with same radiation to posterior at an intensity of 6-7 out of 10. The chest pain lasted about 10 minutes and resolved spontaneously. ... it must be able to understand ! Werner Ceusters CTO www.landc.be
What is understanding ? • To understand something is to know what its significance is. • What 'knowing significance' amounts to may be very different in different contexts: thus understanding a piece of music requires different things of us than understanding a sentence in a language we are learning, for instance. It would be useful, then, for theorists to look at the different kinds of understanding that there are, and examine them in detail and without prejudice, rather than looking for the essence of understanding. (Tim Crane, philosopher of mind) • The significance of a single sentence, too, can vary from context to context. Werner Ceusters CTO www.landc.be
The etymology of “understanding” • “understanding” Latin “substare” • literally: “to stand under” • Websters Dictionary (1961) understanding =the power to render experience intelligible by bringing perceived particulars under appropriate concepts. • “particulars” = what is NOT SAID of a subject (Aristotle) • substances: this patient, that tumor, ... • qualities: the red of that patient’s skin, his body temperature, blood pressure, ... • processes: that incision made by that surgeon, the rise of that patient’s temperature,... • “concepts”: may be taken in the above definition as Aristotle’s “universals” = what is SAID OF a subject • Substantial concepts: patient, tumor, ... • Quality concepts: white, temperature • ... Werner Ceusters CTO www.landc.be
What is natural language understanding? • NLU is constructing meaning from “written” language by which the degree of understanding involves a multifaceted meaning-making process that depends on knowledge about language and knowledge about the world. ( cf. “reading comprehension” by humans. ) • But then: what is “meaning” Werner Ceusters CTO www.landc.be
Dyadic models of “meaning” • Saussure (language philosopher): • signe / signifiant (sign/concept) • Ron Stamper (information scientist): • thing-A STANDS-FOR thing-B • Major drawback: • excludes the “referent” from the model, i.e. that what the sign/symbol/word/... denotes Werner Ceusters CTO www.landc.be
Triadic models of meaning: The Semiotic/Semantic triangle Reference: Concept / Sense / Model / View / Partition Sign: Language/ Term/ Symbol Referent: Reality/ Object Werner Ceusters CTO www.landc.be
Aristotle’s triadic meaning model Words spoken are signs or symbols (symbola) of affections or impressions (pathemata) of the soul (psyche); written words (graphomena) are the signs of words spoken (phoné). As writing (grammatta), so also is speech not the same for all races of men. But the mental affections themselves, of which these words are primarily signs (semeia), are the same for the whole of mankind, as are also the objects (pragmata) of which those affections are representations or likenesses, images, copies (homoiomata). Aristotle, 'On Interpretation', 1.16.a.4-9, Translated by Cooke & Tredennick, Loeb Classical Library, William Heinemann, London, UK, 1938. pathema semeia gramma/ phoné pragma Werner Ceusters CTO www.landc.be
my your understanding understanding Richards’ semantic triangle • Reference (“concept”): “indicates the realm of memory where recollections of past experiences and contexts occur”. • Hence: as with Aristotle, the reference is “mind-related”: thought. • But: not “the same for all”, rather individual mind-related reference symbol referent Werner Ceusters CTO www.landc.be
R1 R2 R3 mole “skin lesion” mole “unit” mole “animal” Don’t confuse with homonymy ! “mole” Werner Ceusters CTO www.landc.be
One concept understanding of x understanding of y referent symbol Different thoughts Homonymy R2 R3 R1 mole “skinlesion” mole “unit” “mole” mole “animal” Werner Ceusters CTO www.landc.be
And by the way, synonymy... the Aristotelian view Richards’ view “sweat” “sweat” “perspiration” “perspiration” Werner Ceusters CTO www.landc.be
Frege’s view • “sense” is an objective feature of how words are used and not a thought or concept in somebody’s head • 2 names with the same reference can have different senses • 2 names with the same sense have the same reference (synonyms) • a name with a sense does not need to have a reference (“Beethoven’s 10th symphony”) sense name reference (=referent) Werner Ceusters CTO www.landc.be
conception concept actor definition representation referent term referent Tetrahedric extensions CEN/TC251 ENV 12264 FRISCO model (information science) Werner Ceusters CTO www.landc.be
Requirements for NLU • Knowledge about terms and how they are used in valid constructions within natural language; • Knowledge about the world, i.e. how the referents denoted by the terms interrelate in reality and in given types of context; • An algorithm that : • is able to calculate a language user’s representation of that part of the world described in the utterances that are the subject of the analysis. • can track the ways in which people express what does NOT represent anything in reality (eg for medico-legal reasons) Only a realist ontology (and not an ontology that deals with “alternative realities”) permits correct disambiguation between 3a and 3b. Werner Ceusters CTO www.landc.be
As such ... • The “things at the top of all the triangles seen so far” are dynamic algorithmic things, and not fixed structures. • Additional support comes from: • P.F.Strawson: “we must stop trying to locate meaning in some invariant relation in which phrases and words stand to the world” • But what then ? • “meaning is use” (Quine, Wittgenstein, Davidson, McDowell, ...) • A theory of meaning taking context and referents into account • Conclusion: who concentrates on the “concepts” is doomed to fail. to be or not to be structured Werner Ceusters CTO www.landc.be
Why are concepts not enough? • Why must our theory address also the referents in reality? • Because referents are observable fixed points in relation to which we can work out how the concepts used by different communities relate to each other ; • Because only by looking at referents can we establish the degree to which concepts are good for their purpose. Werner Ceusters CTO www.landc.be
But why then this fixation on normative “concepts” in Medical Informatics (standards) ? • CEN/TC251 ENV 12264 : • This ENV is applicable to the description of the categorial structure of systems of concepts supporting computer-based terminological systems, including coding systems, for health-care. • concept : “unit of thought constituted through abstraction on the basis of properties common to a set of one or more referents” BUT THEY NEVER IN FACT LOOK AT THE REFERENTS AT ALL! • ISO/TC215/N142:Health informatics —Vocabulary of terminology • The purpose of this International Standard is to define a set of basic concepts required to describe and discuss formal representation of concepts and characteristics, for use especially in formal computer based concept representation systems. • concept: “unit of knowledge created by a unique combination of characteristics” THEY ARE ALREADY TWO LEVELS REMOVED FROM THE REFERENT! Werner Ceusters CTO www.landc.be
What is part of the real world, and what isn’t ? • CEN/TC251 ENV 12264 : • “In this model, the elements in the base plane of the pyramid are all concrete or abstract phenomena of the real world (or so established as items of knowledge that they are considered real, e.g. "perpetuum mobile"), or expressions in a language. These phenomena are called objects or, better, referents. Concepts situated at the top of the pyramid are mental constructs. To talk about them, we have to use established expressions, such as names, terms in special subject fields, or longer verbal definitions”. the real world Werner Ceusters CTO www.landc.be
“Ontology” • In Information Science: • “An ontology is a description (like a formal specification of a program) of the concepts and relationships that can exist for an agent or a community of agents.” (Tom Gruber) • In Philosophy: • “Ontology is the science of what is, of the kinds and structures of objects, properties, events, processes and relations in every area of reality.” (Barry Smith) Werner Ceusters CTO www.landc.be
"Where there is the sound of a blow, there is respect”(Pashtun proverb) • “I repeatedly get confused by the (in my opinion structurally confusing) terminology of those people (like John Sowa) who try to do ontology but end up just studying concepts.” (Barry Smith, pers. comm.) Werner Ceusters CTO www.landc.be
From buzz-word to the “O-word” • “An ontology is a classification methodology for formalizing a subject's knowledge or belief system in a structured way. Dictionaries and encyclopedias are examples of ontologies.” (X1) • “A terminology (or classification) is a kind of ontology by definition and it should preserve (and "understand") the relationships between the 1,000s of terms in it or else it would become a mere dictionary (or at best a thesaurus).” (X2) • “Ontologies are Web pages that contain a mystical unifying force that gives differing labels common meaning.” (X3) Werner Ceusters CTO www.landc.be
Why existing “ontologies” don’t match OUR needsfor a “core” ontology • MeSH: inconsistency in hierarchical relationships • MedDRA: no difference between concepts and terms • UMLS: integrates various source terminologies without taking different meanings of terms, different structures, different purposes, etc... into account • SNOMED: formal system, but lacks sufficient depth of the ontology • GALEN: very detailed ontology for some parts of healthcare but very poor coverage over healthcare as a whole. The ontology is independent from language as medium of communication (the ontology does not accept language as part of reality) • ... Most important: all of them deal with alternative realities or possible worlds and none is focused on the referents in THIS world ! Werner Ceusters CTO www.landc.be
Another known problem: Intentionality in the semiotic triangle • “The physician wanted to give the patient an injection” • The physician gave the injection (= referent), and because of that, the patient died from a side-effect. • Hence: “giving the injection” = “killing the patient” (= two references) • Hence??? • “the physician wanted to kill the patient” Werner Ceusters CTO www.landc.be
medical+linguistic ontology (data + algorithms) concept system concept system language definitions referents language referents Our approach the real world the standard view our view Werner Ceusters CTO www.landc.be
Halliday’s systemic functional grammar Aristotelian realism The structures of language are partially determined by our conceptualisation of the world.Halliday No mental representation without language Fodor Meaning is located in the interaction between living beings and the environment James J. Gibson, Ecological Realism in Psychology concept referents language Baboons and humans have different cut-off points for discerning "same" objectsbecause our verbal expression for "same" makes the idea of "same" more restrictive.” Fagot and Wasserman (Centre for Research in Cognitive Neuroscience in Marseille) Exploit the relationships along the vertices Werner Ceusters CTO www.landc.be
The current picture medical+linguistic ontology linguistic ontologies (per language) normative concept system(s) Realist ontology Werner Ceusters CTO www.landc.be
The possible final picture BFO/MedO “validates” Werner Ceusters CTO www.landc.be
Why BFO might match our needs • BFO = Basic Formal Ontology(Barry Smith, draft 0.0005 (3.7.02) • BFO should provide a theory of formal categories for entities of all types, including substances, qualities/ roles/ functions/ dispositions, and processes • then for each of a series of what we might think of as flat domains, starting with medicine (chemistry, genetics…= MedO), BFO should provide the basis for a theory of the categories of entities in those domains • for non-flat domains like language we will need three components: • 1. BFO applied to language itself • 2. BFO applied to the world (the referents) • 3. an ontological theory of the relations between 1. and 2. Werner Ceusters CTO www.landc.be
BFO/SNAP: Entities existing in toto at a time Werner Ceusters CTO www.landc.be
SPAN: Entities extended in time (1) Werner Ceusters CTO www.landc.be
SPAN: Entities extended in time (2) Werner Ceusters CTO www.landc.be
NLU enabling tools for knowledge supported data-entry and -retrieval L L L & & & C C C Linguistic application components Medical and linguistic knowledge required for language understanding LinC Base Data structure and function library for language understanding LinC Factory An integrated approach Werner Ceusters CTO www.landc.be
Language A Proprietary Terminologies L & C LanguageB Lexicon Lexicon Others ... Grammar ICPC Grammar SNOMED ICD LinC Base Medico-linguistic ontology Formal Domain Ontology Cassandra Linguistic Ontology MEDDRA Werner Ceusters CTO www.landc.be
HAS-SPATIAL-POINT-REFERENCE HAS-CONNECTING-REGION HAS-OVERLAPPING-REGION IS-SPATIAL-PART-OF HAS-DISCRETED-REGION HAS-SPATIAL-PART HAS-DISCONNECTED-REGION IS-PROPER-SPAT.-PART-OF IS-INSIDE-CONVEX-HULL-OF HAS-PROPER-SPATIAL-PART IS-PARTLY-IN-CONVEX-HULL-OF IS-OUTSIDE-CONVEX-HULL-OF HAS-EXTERNAL-CONNECTING-REGION IS-NON-TANG.-SPAT.-PART-OF IS-TANG.-SPAT.-PART-OF IS-TOPO-INSIDE-OF IS-GEO-INSIDE-OF IS-SPAT.-EQUIV.-OF HAS-NON-TANG.-SPAT.-PART HAS-TANG.-SPAT.-PART Based on formal ontology HAS-PARTIAL-SPATIAL-OVERLAP Werner Ceusters CTO www.landc.be
Example: joint anatomy • joint HAS-HOLE joint space • joint capsule IS-OUTER-LAYER-OF joint • meniscus • IS-INCOMPLETE-FILLER-OF joint space • IS-TOPO-INSIDE joint capsule • IS-NON-TANGENTIAL-MATERIAL-PART-OF joint • joint • IS-CONNECTOR-OF bone X • IS-CONNECTOR-OF bone Y • synovia • IS-INCOMPLETE-FILLER-OF joint space • synovial membrane IS-BONAFIDE-BOUNDARY-OF joint space Werner Ceusters CTO www.landc.be
Snomed-RT : “Convulsion” MESH-2001 : “Seizures” ISA IS-narrower-than Snomed-RT : “Seizure” MESH-2001 : “Convulsions” Has-CCC Has-CCC Has-CCC Has-CCC L&C : Health crisis IS-A IS-A L&C : Seizure L&C : Convulsion IS-A IS-A L&C : Epileptic convulsion Linking external ontologies Werner Ceusters CTO www.landc.be
Generalised Possession Healthcare phenomenon Human Has- possessor Has- possessed IS-A 1 1 2 1 IS-A Having a healthcare phenomenon IS-A 2 Is-possessor-of Patient Has-Healthcare-phenomenon Malignant neoplasm IS-A 3 IS-A 3 Cancer patient lung carcinoma Mr. Kovácshasa pulmonary carcinoma Linguistic and domain ontologies Werner Ceusters CTO www.landc.be
Halliday’s systemic functional grammar • A “complete” theory for NLU • constructivistic basis: “language construes human experience” • English: It is raining • Chinese: The sky drops water • hence: natural languages are instances of generic schemes • macro-structure of documents • derive a “structural formula” • micro-structure of documents • lexical cohesion • in-conjunction analysis Werner Ceusters CTO www.landc.be