1.09k likes | 1.25k Views
How to build an ontology 2. Barry Smith http://ontology.buffalo.edu/smith. The 3-level Distinction. Level 1: everything that exists (things, processes, data …) ; Level 2: ideas in people’s minds (diagnoses, thoughts, images in your head, expectations, beliefs, fears …) Level 3:
E N D
How to build an ontology 2 • Barry Smith • http://ontology.buffalo.edu/smith
The 3-level Distinction • Level 1: • everything that exists (things, processes, data …); • Level 2: • ideas in people’s minds (diagnoses, thoughts, images in your head, expectations, beliefs, fears …) • Level 3: • publicly available (published, written down, drawn, recorded, saved) versions of level 2 entities (ontologies, databases, journal articles, newspaper reports, diaries …)
The 3-level Distinction • Level 1: • #120: an incident that happened; • Level 2: • #213: the interpretation by some cognitive agent that #120is an security breach; • #31: the expectation by some cognitive agent that similar incidents might happen in the future; • Level 3: • #402: an entry in and information system concerning #120; • #1503: an entry in some other information system about #31 for mitigation or prevention purposes.
How do we know which general terms designate universals? • Roughly: terms used by scientists to designate entities about which we have a plurality of different kinds of testable proposition • (cell, electron ...)
More precisely: terms which designate universals are: • General • Used in current scientific textbooks to express laws of nature • Logically non-compound (‘non-rabbit’, ‘rabbit or violin’ do not designate universals) • Contain no parts designating particulars (‘cat in Leipzig’, ‘Finnish spy’ do not designate universals
Class =def • a maximal collection of particulars determined by a general term • (‘cell’. ‘electron’ but also: ‘ ‘restaurant in Palo Alto’, ‘Italian’) • the class A • = the collection of all particulars x for which ‘x is A’is true
universals vs. their extensions • universals • {a,b,c,...} collections of particulars
Extension =def • The extension of a universal A is the class: instance of the universal A • (it is the class of A’s instances) • (the class of all entities to which the term ‘A’ applies)
Problem • The same general term can be used to refer both to universals and to collections of particulars. Consider: • HIV is an infectious retrovirus • HIV is spreading very rapidly through Asia
universals vs. classes • universals • {c,d,e,...} classes
universals vs. classes • universals • defined classes
universals vs. classes • universals • populations, ...
Defined class =def • a class defined by a general term which does not designate a universal • the class of all diabetic patients in Leipzig on 4 June 1952
OWL is a good representation of defined classes • sibling of Finnish spy • member of Abba aged > 50 years
Terminology =def. • a representational artifact whose representational units are natural language terms (with IDs, synonyms, comments, etc.) which are intended to designate universals together with defined classes.
? universals, classes, concepts • universals • defined classes • ‘concepts’
universals < defined classes < ‘concepts’ • ‘concepts’ which do not correspond to defined classes: • ‘Surgical or other procedure not carried out because of patient's decision’ • ‘Congenital absent nipple’ • because they do not correspond to anything
(Scientific) Ontology =def. • a representational artifact whose representational units (which may be drawn from a natural or from some formalized language) are intended to represent • 1. universals in reality • 2. those relations between these universals which obtain universally (= for all instances) • lung is_a anatomical structure • lobe of lung part_of lung
How to build an ontology • work with scientists to create an initial top-level classification • find ~50 most commonly used terms corresponding to universals in reality • arrange these terms into an informal is_a hierarchy according to this Universality principle • A is_a B every instance of A is an instance of B • fill in missing terms to give a complete hierarchy • (leave it to domain scientists to populate the lower levels of the hierarchy)
Principle of Low Hanging Fruit • Include even absolutely trivial assertions (assertions you know to be universally true) • pneumococcal virus is_a virus • Computers need to be led by the hand
Goal: Each term in an ontology represents exactly one universal • there are universals also of collectivities: • population • complex of cells
the use-mention confusion • swimming is healthy and has eight letters
Principle • Avoid confusing between words and things • Avoid confusing between concepts in our minds and entities in reality • Recommendation: avoid the word ‘concept’ entirely
Principle • For the sake of interoperability with other ontologies, do not give special meanings to terms with established general meanings • (Don’t use ‘cell’ when you mean ‘plant cell’)
Principle • Supply definitions wherever possible • (both human-understandable natural language definitions, and equivalent formal definitions)
Principle • Each term should have at most one definition • which may have both natural-language and formal versions
The Problem of Circularity • A Person = def. A person with an identity document • cell = def. plant cell, consisting of protoplast and cell wall; ...
Principle • Avoid circular definitions • (The term defined should not appear in its own definition)
Principle • A definition should use terms which are easier to understand than the term defined
Principle • Use Aristotelian definitions • An A is a B which C’s. • A human being is an animal which is rational
Principle • Do not seek to define everything
In every ontology • some terms and some relations are primitive = they cannot be defined (on pain of infinite regress) • Examples of primitive relations: • identity • instance_of
Rules for formatting terms • Avoid abbreviations even when it is clear in context what they mean (‘breast’ for ‘breast tumor’) • Avoid acronyms • Avoid mass terms (‘tissue’, ‘brain mapping’, ‘clinical research’ ...) • Treat each term ‘A’ in an ontology is shorthand for a term of the form ‘the universal A’
Univocity • Terms should have the same meanings on every occasion of use. • (= They should refer to the same universals) • Basic ontological relations such as is_a and part_of should be used in the same way by all ontologies
Universality • Ontologies are made of relational assertions • They should include only those which hold universally • pneumococcal virus causes pneumonia
Universality • Often, order will matter: • We can assert • adult transformation_of child • but not • child transforms_into adult
Universality • viral pneumonia caused by virus • but not • virus causes pneumonia • pneumococcal virus causes pneumonia
Universality • results analysis later_than protocol-design • BUT NOT • protocol-design earlier_than results analysis
Positivity • Complements of universals are not themselves universals. • Terms such as • non-mammal • non-membrane • other metalworker in New Zealand • do not designate universals in reality
Positivity • What about non-smoker?
Objectivity • Which universals exist in reality is not a function of our knowledge. • Terms such as • unknown • unclassified • unlocalized • arthropathies not otherwise specified • do not designate universals in reality.
Keep Epistemology Separate from Ontology • If you want to say that • We do not know where A’sare located • do not invent a new class of • A’s with unknown locations • (A well-constructed ontology should grow linearly; it should not need to delete classes or relations because of increases in knowledge)
Keep Sentences Separate from Terms • If you want to say • I surmise that this is a case of pneumonia • do not invent a new class of surmised pneumonias • Confusion of ‘findings’ in medical terminologies
Single Inheritance • No kind in a classificatory hierarchy should have more than one is_a parent on the immediate higher level
Multiple Inheritance • thing • car • blue thing • is_a • is_a • blue car
Multiple Inheritance • is a source of errors • encourages laziness • serves as obstacle to integration with neighboring ontologies • hampers use of Aristotelian methodology for defining terms • hampers use of statistical search tools
Multiple Inheritance • thing • blue thing • car • is_a1 • is_a2 • blue car
is_a Overloading • The success of ontology alignment demands that ontological relations (is_a, part_of, ...) have the same meanings in the different ontologies to be aligned.