GALEN and Simple-top-Bio

Upper Ontologies: An information systems & applications perspective(and how it led to Simple Top Bio) Alan RectorSchool of Computer Science / Northwest Institute of Bio-Health Informaticsrector@cs.man.ac.uk with special acknowledgement to Jeremy Rogers www.co-ode.orgwww.clinical-escience.orgwww.opengalen.org

GALEN and Simple-top-Bio • GALEN used “Conceptual Models” / “Models of meaning” to drive user interfaces and model the meaning of terminologies • Originated with PEN&PAD User Centred Design of clinical system for UK GPs • Scalable fractally tailorable representation of 10Ks of “forms” • Evolved from the bottom up to meet needs of • collaboration, information management and software engineering • support the required inferences while still being usable • Emphasis on relations ( “attributes” / “properties” ) as much as classes • Emphasis on “upper domain” rather than “top” • View from a point in time - time indexing external in medical record • Basic structure little altered since 1993

...which was about when Gruber started to talk about “ontologies” • Gruber and others borrowed to word “ontologies” to relabel the tools in use to describe and manage information • GALEN never used the word “ontology” • Arguably, a misnomer for the models to build sound scalable information systems • Simple Top Bio is a reformulation in OWL and modern language with some recent insights added • Make GALEN ideas available and give vocabulary for discussing them • Provide a vocabulary to discuss the consequences / inferences of upper ontologies • Help students with examples for teaching • Help people avoid “blank sheet of paper” syndrome • Meet the challenge: “Can you do it by ’20 questions’?” • Released early on request; not finished 4

Plan of the Talk • Introduction • History and examples from GALEN (and elsewhere) • Goals & criteria for Simple Top Bio and overview • Summary 5

Where we started: “Models of meaning” express patterns for ... • User interfaces & Terminology • “All and only what it is sensible to say” • “Post coordination” & automatic classification • when are two expressions equivalent? • Reproducibility - high inter-rater reliability • Separation of language and and concept - multi-lingual systems • Software engineering - Maintainability and evolvability • Parsimony • Avoidance of undetected unexpected side effects - each change in only one place - • Consistent application of principles • Beating the combinatorial explosion • A “Terminology compiler” • Fractal knowledge & Fractal tailoring • Indefinitely extensible & tailorable • Avoiding the “PROMIS trap” • Re-use and complex data models - better schemas for • Complex dependencies • Highly variable structure 6

How to argue as important as the conclusions. What counted as evidence: • Expressivity • Could we say what needed to be said • Inferences • Implications • Correct computation of classification, equivalence, inconsistency • Constraints • What it “did NOT make sense” to say (GALEN “grammatical” level) • at least, avoid gross category errors - “Green dream” • Understandability / usability • Could people understand it and use it reproducibly • Compatibility with existing usage • Avoidance of “Analysis paralysis” • Computability & Software engineering • Tractability & of inference, classification and inconsistency checking • Clean inheritance. • All changes in one, and only one place. 7

We needed patterns, e.g.... • To represent diseases, procedures, ...e.g. “Heart disease”, “Heart Operation”, ... • To capture “A disease of the part is a disease of the whole” • Mixing part-of and kind-of a major source of inconsistency & error • To ensure that parts and wholes were coordinated with modifiers • “Right Hand” could only be part of the “Right upper extremity” • Decide when to reify relations • “right hand” vs “to the right at 45°” • To coordinate • “Findings” - e.g. “cough”, “diabetes”, “hypertension” - • are present/absent • “Observables” - e.g. “blood pressure”, “diabetic control”, • have values, e.g. “130/80”, “good_control” • To manage defaults and exceptions • To manage conflicting variants 9

Example:We needed to manage arguments over “terms” • Labels and meanings • Does “Neoplasm” imply malignancy or merely “New growth” • 100% agreement that we needed two meanings: • “New growth, benign or malignant” • “Malignant new growth” • Near agreement on relabeling • “New Growth” to “Proliferation” • to take in leukaemias and lymphomas • No agreement on what should be labelled by “Neoplasm” • In the end, “Neoplasm” and its cognates labelled different meanings for different national and clinical communities. • Internally used “Proliferation” and “Malignant Proliferation” • “Better not to be understood than misunderstood” 10

Does a chimp have two hands or four?What counts as evidence? To convey what information? To whom? To

Example: “Disorders” / “Conditions”: Start from what inferences should be supported:What questions should be answered • Normal and abnormal / pathological and “physisological” • Some things are noteworthy but do not require treatment • e.g. Complete situs inversus, old injuries, etc... • “abnormal but physiological” • Some things are “pathological” - require management • e.g. Pneumonia, cancers, diabetes, ...” • Some things may be abnormal or pathological in certain cases • e.g. A heart with an patent ductus arteriosis • Some things are always abnormal or pathological • e.g. Malignant tumours, pneumonia, ... • Settled on has_pathological_status... • Normal / NonNormal • NonPathological / Pathological • IntrinciallyPathologica / IntrinsicallyNonNormal • Expressed so as to be classifiable by a reasoner 12

Normal / NonNormal / Pathological ...as organised by formal classification Captures the required inferences and distinctions - easy with logic, hard otherwise

Those distinctions let us answer:What is a “disease” / “disorder” / “condition”? • Anything pathological • What does it make sense to describe as pathological? • Anatomical structures • either intrinsically or when distorted in some way • Lesions • Structures that are always pathological - e.g. malignant tumours • Processes • either intrinsically or when altered • Qualities - either intrinsically or when altered • Therefore the domain of “has_pathological_status” includes • At least the disjunction of the above • For lack of a better label, GALEN labeled the disjunction “Phenomenon” • Note that it crosses traditional ontological boundaries • Independent / Dependent • Continuant / Occurrent • But it is needed to represent the information • “What is / might be wrong with this patient?” 14

then can answer “What is ‘Heart disease’” • “A pathological phenomenon involving the heart” • How to represent “involving”? • What are the ways the heart may be involved? • is_quality_of • has_object (GALEN: ”actsOn”) • has_actor (Linguists: has_agent) • is_located_in • Must interact with is_part_of so that “disorders of part are diseases of whole” • Galen labelled it hasLocation; Simple To Bio has_locus • GALEN used role inclusion to say • X has_locus (Y THAT is_part_of SOME Z) --> X has_locus SOME Zi.e. has_locus o is_part_of --> has_locus • Simple Top Bio gets the same result by rewriting:“Heart disease” means “Disorder of Heart or any part of the heart” • More easily expressed in OWL • And more convenient for saying “Diseases of the whole” than role inclusion • (SEP Triples and variants - Schulz) 15

Other Questions: Which relations to reify? Qualities ( “Features”) & “Selectors” • Needed to be able to say“Body temperature that was elevated, 38°C & rising, but less than yesterday’s” • Needed to represent “Body Temperature” so we could talk about it • Similarly for many but not all “Features” • So for consistency represented all such features by a consistent pattern • has_quality SOME (Quality THAT has_state SOME State) • plus cardinality constraints • ... but • Not true for “selectors” • “Right/Left” as in “right hand”;“upper/middle/lower” as in lobe of lung • Therefore has_left_right_selector not reified • Right_hand == Hand THAT has_left_right_selector VALUE left_laterality 16

... but... • Users found reification verbose and difficult • So provided syntactic sugar (”Intermediate representation”) • Or can define summary qualities • Severe == Severity THAT has_relative_state SOME Severe_state • Then can express “severe” in simple cases by • has_quality SOME Severe. • (GALEN’s tools made rewriting easier. OWL makes definitions easier.) 17

An example of logically equivalent forms - • Rewrites between logically equivalent forms are just “coordinate transformations” • Provided they are fully specified • In this case must include correct cardinality & disjoint constraints • THING has_quality MAX 1 SeveritySeverity = Mild OR Moderately_severe OR SevereDISJOINT Mild, Moderately_severe, Severe • The choice then hinges on • Usability • Computability • Expressive adequacy 18

Similar arguments for the perennial problem:“Findings” and “Observables” • “has SOME Diabetes” conveys information • “has SOME Body_temperature” conveys no information • All animals have a body temperature (though it may be ambient) • Animal --> has_quality EXACTLY 1 Body_temperature • (At a given time, place and observer) • The value or state of Body_temperature conveys information • has SOME (Body_temperature THAT has_quantitative_state VALUE 39°C AND has_expected_state SOME Elevated) • “Having diabetes” and “Having a body temperature that is elevated” are analogous • Meta criterion • “Observables” with specified states act as “findings” 19

... and similarly, inferences and information for “clay and statues” or tissues and organs • “The statue is made of clay” or“The statue is an amount of clay” • The “liver parenchyma” is made of “liver parenchymal tissue” • Different things to be said/inferred about clay and statues • The granularity and density of the clayThe shape of the statue • The tissue has arrangement, distortion, distribution of cell types,...The liver has size and shape • Either - or both - have mass • and if the statue is entirely made of clay, the same mass 20

Cooperation on theUpper Ontologies to enable …. Cooperation on TopDomain Ontologiesto enable… Lessons for upper “ontologies” for Information Systems: Consider for the consequencesMotivate from the bottom upOntology Layers: What’s it for? Cooperation on theDomain Content Ontologiesto enable… Cooperation on Information systems & resources The Meta Ontology is to enable…

Principles for Simple Top Bio • Support inference of classification of top level domain ontology • The goal is to help domain experts create their starting points and patterns • Just enough • No distinction without a difference! • Properties are as important as Classes/Entities/Concepts • If an upper level category does not act as a domain or range constraint or have some other engineering effect, why represent it? • Exclude things that will be dealt with by other means or given • “Concrete domains” - strings, numbers, etc. but not quantities • Limit representation to just what is needed for • Time and place • Non_physical – e.g. agency • Causation – except in sense of “aetiology” • Understandable and reproducible • “Twenty questions approach” • For each entity a property, a paraphrase, examples, and questions. • Implementable in OWL/DL with QCRs (Now OWL 1.1) • Potentially support a large ontology • Actually implemented and testable

Tour from the top down... • The very top • Domain_entity • Always good practice to provide your own top • You may want to create ‘probes’ or do other nasty work arounds. • The real ontology is under Domain Entity • NB: owl:Thing has different semantics in OWL-Full and OWL-DL • Insulate yourself from standards oddities

Basic distinctions • Self-standing vs Refining • Self standing • e.g. “Person”, “Computer”, “Idea”… • Word chosen to allow discussion if it is really the same as “independent” • If you use the same word, it is hard to discuss • “Better not to be understood than to be misunderstood” • Refining • e.g. “size”, “big”, “serious”, … • Self_standing_entity is_refined_by Refining_entity • Establishes the domain & range of a top property distinction • has_quality is a child of “refines” • Question: Does it make sense on its own Must the list stay open? • If so, self_standing.

Within Self Standing • Continuant vs Occurrent • Self_standing_entity participates_in Occurrent_entity • Physical vs Non_physical • Non_physical is_manifested_by Physical • Only physical an be material • Material defines non_material (things define holes) • Discrete vs Mass • Discrete_entity is_constituted_of Mass_entity • Complex – all collections, relations, groups, etc. • No opposite – all arguments deferred • Complex has_member Self_standing_entity • (Biological – Non-biological) • Biological is domain of many relations Take them one at a time.

Continuant vs Occurrent • “Process happen to things” • Continuants participate_in Occurrents • Occurrents can also participate in other Occurrents • But only occurrents can be participated in • Occurrent is domain for has_parfticipant • Continuants (“endurants”) • Things that retain their form over time • People, books, desks, water, ideas, universities, … • Occurrents (“perdurants”) • Things that occur during time • Living, writing a book, sitting at a desk, the flow of water, thinking, building the university, ... • Question: Do things happen to it? then ContinuantDoes it happen or occur? then Occurrent.

Properties for OccurrentsProcesses act on things (& other processes)and have actors • One kind of participation is acting on (having an “object”) • Occurrent acts_on Self_standing_entity • Linguists call it “agency” but that label gets muddled up with legal agency and responsibility • Another kind of participation being an actor ( “agency”) • Occurrent has_actor Self_standing_entity • Can one occurrent be the actor for another? • Defer the choice

and Processes have outcomes • One form of acting-on something is having it as an outcome • Represented in the property hierarchy • has_participant acts_on has_outcome • Occurrent has_outcome Self_standing_entity • Outcomes can be either Continuants or Occurrents • But only Occurrents have outcomes • Check the Domain and Range of has_participant

Which gives rise to the problem of Process-Outcome “duals” • e.g. “Ulcer” and “Ulceration”, “Erosion (lesion)” and “Erosion (process)”, etc. • How to avoid duplication of service • Ulcer == Lesion THAT is_outcome_of UlcerationOR • Ulceration == Process THAT has_outcome Ulcer has_potential_outcome • Requirements - to infer: • All ulcers must have been caused by ulceration • It is not the case that all ulceration results in ulcers (but never contradictory) • Ulceration has duration, etc. Ulcers have diameter, depth, etc. • Good approximation (post GALEN rather than GALEN) • Lesion --> is_outcome_of SOME ProcessUlceration == Process THAT has_outcome SOME Ulcerhas o has_outcome --> has • THEN: has SOME Ulceration subsumes has SOME Ulcer as required 29

Physical vs non-Physical • Physical entities manifest non-physical patternsPhysical entities embody non-physical agents • Physical entities have energy or mass and occupy space or time • bodies, electricity, water, buildings, burning, cavities, planes and lines formed by the intersection of physical things… • Nonphysical things • Describe “Patterns” • Forms, styles, ‘oeuvres’, … • Describe “psycho-social phenomena” • Organisations, agents, institutions, ideas • Question: Does it have mass or energy? Does it occupy space at some time? Then it is (probably) physical.

Material vs Non-material Physical things • Within Physical_entities • The problem of holes • Material things define non-material things • The room defines the interior of the room • The glass defines the space in the glass • The donut defines the hole in the donut • The intersection of the walls defines the corner • Good example of where upper ontologies provide ready answers to otherwise puzzling questions • A very useful insignt from ontology • GALEN was rather less elegant

Discrete vs Mass • Things are made of StuffDiscrete_entities are constituted of Mass_entities • The statue vs the clay of which the statue is made • The liver vs the tissue that makes up the liver • The table top vs the wood that constitutes the table top • Discrete things can be countedMass things can only be measured • Guarino calls them “Amount of matter” • An instance of a mass stuff is an amount of that stuff • Questions: Can I count it? then it is probably discrete Other hintsIf I make a plural, is it odd or something different? e.g. “waters”, “papers”, “thinkings”, or do plurals mean different kinds e.g. “paints”, “tissues”? Do I say pieces/drops/lumps of it? then it is probably mass

Mass / Discrete also seems to make sense for processes • “Walking” vs “A Walk”“Lecturing” vs “A Lecture”, etc. • But has always proved not to be reproducible“Digestion” vs “Digestion of a meal” • GALEN dropped it; Lenat and Guha dropped it for Cyc; DOLCE does not support it (doesn’t fit BFO at all) • Decision deferred in Simple Top Bio • Nothing prevents the notion of a “discrete occurrent” or “mass occurrent”, but not named 33

Basic Distinctions

A better way to explore an ontology - Pick something and look at it from bottom up:A Cell - before reasoning / classification

And its classificationCell - after reasoning / classification

Property Hierarchy • Motivates the class hierarchy • e.g. Domains and Ranges • Provides inference • e.g. Flavours of is_part_of • Maintains distinctions • e.g. Membership and containment vs partonomy • Less well supported by most tools 37

Summary and Lessons • Upper “ontologies” express patterns for information • Motivation begins with the inferences to be drawn • Properties as important as classes • Distinction without a difference are useless • To be useful & meaningful, an ontology must constrain - i.e. define errors - otherwise • Errors go undetected • Arguments cannot be settled • For re-use, defer commitment • Application distinctions may not follow ontological lines • Human factors critical • Distinctions must be understandable and reproducible • Formal, testable representations empower users • reduce need for gurus • Ultimately models are mathematical theories • Tested by their consequences / inferences • Effects in software 38

GALEN and Simple-top-Bio