900 likes | 961 Views
Robust Ontology Design. Tutors: Aldo Gangemi, Valentina Presutti Lecture@SSSW2011 STLab, ISTC-CNR, Rome, Italy { aldo.gangemi,valentina.presutti}@cnr.it. 1. Outline. Ontologies and quality Ontology design The ODP programme for best practice collection Knowledge-independent patterns
E N D
Robust Ontology Design • Tutors: Aldo Gangemi, Valentina Presutti • Lecture@SSSW2011 • STLab, ISTC-CNR, Rome, Italy • {aldo.gangemi,valentina.presutti}@cnr.it 1
Outline • Ontologies and quality • Ontology design • The ODP programme for best practice collection • Knowledge-independent patterns • Knowledge patterns • Experimental findings • Knowledge pattern extraction
The mothers of • Formal interpretation gives us a precise way to establish what we are talking about, and therefore to provide reliable automated inferences when needed • Natural language is able to describe very different types of facts with similar structures, for example ... 25 25
... different types of facts ... • Wile buys from ACME [ground fact] • ACME has been reported for abusive discharge [reported fact] • Wile is blonde [attributive fact] • Wile is a coyote [classification fact] • To discharge is to release someone from a job [meaning fact] • To discharge is a transitive verb [terminological fact] • Discharge is nine characters long [information fact] • Discharge is a class [formal fact] • Discharge can be abusive in Italy [contextual fact] • Discharge represents a failure [interpretive fact] 5
We live in a wild world • Neat’s empire (cf. Jim’s DL tower) • The scruffy web (cf. Jim’s Web tower) • Legacy information systems • Cultural dynamics within organizations and the social world • Long history of attempts to get rid of ambiguity • From Aristotle to Leibniz, Neopositivism, and eventually Logical AI • Long history of attempts to take advantage of ambiguity • Cf. E. Bencivenga’s Hegel's Dialectical Logic • In different contexts you may need different levels of ambiguity resolution • Inconsistency and incompleteness are not always important • Very important is to deal with many-flavored linguistic and common sense semantic phenomena (cf. Watson and the Turing test) • Meaning = contextual relevance? 6
Pythagorean categories SSSW09
Computational ontologies • “Ontology” used to be a philosophical notion • originally a philosophical name (Lorhardus) • … but also “semantic web” (Michel Foucault) • Computational Ontologies • software components • expressed and managed in formal languages • E.g. standard W3C languages like RDF, OWL, RIF, SPARQL • Ontology design is the core aspect of semantic technologies • Quality is associated with good design
What we can do with OWL • ... (usually) we can check consistency/coherence, classify, and query knowledge • a lot of modelling practices embedded into it • definitions, restrictions, booleans, nominals, chains, punning, etc. • this is great, but ... 10
Logical constructs alone do not always help • e.g. owl:sameAs can be wrongly used and still we have consistency (but we could also get inconsistency: cf. Hayes and Halpin’s experiment on LOD) • Why logic is not enough? 12
What logical primitive? • E.g. OWL gives us constructs from a logical language, but does not give us any guidelines on how to use them in order to solve our tasks • E.g. modeling something (even the same sense) as an individual, a class, or an object property can be quite arbitrary • Modelling “styles”, typically because of specific reasoning requirements 13
Heterogeneous requirements • cf. Semantic Web Interest Group post May 27th, 2008 by Zille Huma: • "I have been wondering for sometime now that why isn't it a popular trend to store standard activities of a domain in the ontology and not only the concepts, e.g., for the tourism domain, ontologies normally contain concepts like Tourist, Resort, etc. but I have not so far come across an ontology that also contains the standard activities like searchResort, bookHotel, etc. Why is it so? What support is provided in the ontology langauges to model the standard activities of the domain as well?" • (1) “searching resorts is a type of functionality required for this kind of services” • owl:Class(searchResort) rdfs:subClassOf(Functionality) • (2) “a functionality for searching resorts is implemented in our web service” • owl:Individual(searchResort) rdf:type(Functionality) • (3) “who has been searching for what resorts in our web service?” • owl:ObjectProperty(searchResort) rdfs:domain(Customer) rdfs:range(Resort) • (4) “how many users have been using our resort searching functionality?” • owl:DatatypeProperty(searchResort) rdfs:domain(Customer) rdfs:range(xsd:boolean)
Quality • STLab people research from 2004-5: “A formal framework for ontology evaluation and selection” • Three quality dimensions: Structural-Content-Sustainability • Content is the primary dimension • Content compliance spans Coverage-Task-SelfExplanation • Task is the immediately measurable aspect • Quality is not maximal and abstract, but bound to context • Partial orders of problems and reusable solutions • Good practices (history) • Empirical methods for evaluation (measurability)
Examples of checking tools • Graph measures • Reasoners: HermiT, Pellet, etc. • LINTs: Pellet, OPPL (custom tests) • agghiai-2:pellet-2.2.2 agghiai$ sh pellet.sh lint -v /Users/agghiai/Workspaces/AllPatterns/dul/DUL.owl • No RDF lints found. • No OWL 2 DL violations found for ontology <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl> • OWL Lints found for ontology <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl>: • [EquivalentAndSubclassAxiomPattern: A named concept appears in equivalent axiom(s) and on the left-hand side of a subclass axiom] • <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Agent> <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#InformationEntity> <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#InformationRealization> <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#LocalConcept> <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Object> <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Person> <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#PlanExecution> <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#SocialObject> <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#WorkflowExecution> • [ExistentialExplosionPattern (MaxTreeSize = 10000): Concepts/Individuals are involved in a large some/min/exact value restrictions tree/loop - maximum recommended number of generated nodes is 10000] • - [3.87E10] <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Goal> <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Description> <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Entity> <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Object> <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#PhysicalAgent> <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#SocialObject> ... and 5 more. • XD Analyzer • rule-based tests: typical sources of errors such as domain intersection
Task-oriented ontology design • Ontologies must match both domain and task • Allow the description of the entities (“domain”) whose attributes and relations are concerned because of some purpose • social events and agents as entities that are considered in a legal case • research topics as entities that are dealt with by a project, worked on by academic staff, and can be topics of documents • Serve a purpose (“task”) • finding entities that are considered in a same legal case • finding people that work on a same topic • matching project topics to staff competencies, time left, available funds, etc.
Ontology Design Patterns • An ontology design pattern is a reusable successful solution to a recurrent modeling problem
Design patterns • Christopher Alexander, then Gang of Four • Architecture ... Software engineering ... Knowledge engineering • Focus on natural evolution of good practices • Rich vocabulary to talk about problem-solving dynamics in actual people doing applied stuff • Quite strict limits on the size of the problem-solution pair
Knowledge patterns • Peter Clark & Bruce Porter (2000) • Address one aspect (i.e. schematic) of the knowledge engineering domain • Reusable axiom schemata, signatures can be morphed while keeping the logical and reasoning properties • Implemented in a language (KM) that needs (at least) a full-fledged DL+rule language to be reengineered for the SW • Related to competency questions (Grüninger & Fox)
Ontology patterns • Staab, Svatek, Gangemi, Rector, ... • Put together the schematic and pragmatic approaches to design patterns • Good practices, competency questions, unit tests • Several types of design patterns, addressing most aspects of ontology design • Good news: users seem to like patterns; quality improves
Catalogues of ODP 1/2 • ODPs are collected and described in catalogues and comply to a common presentation template • The ontologydesignpatterns.org initiative maintains a repository of ODPs and a semantic wiki for their description, discussion, evaluation, certification, etc.
Logical vs. Knowledge patterns • A Logical ODP describes a formal expression that can be exemplified, morphed, instantiated, and expressed in order to solve a domain modelling problem • owl:Class:_:x rdfs:subClassOfowl:Restriction:_:y • Inflammation rdfs:subClassOf (localizedIn some BodyPart) • Colitis rdfs:subClassOf (localizedIn some Colon) • John’s_colitis isLocalizedIn John’s_colon • “John’s colon is inflammated”, “John has got colitis”, “Colitis is the inflammation of colon” expressedAs Linguistic Pattern Logical Pattern (MBox) Generic Knowledge Pattern (TBox) Specific Knowledge Pattern (TBox) Data Pattern (ABox) expressedAs expressedAs morphedAs exemplifiedAs instantiatedAs Logic Meaning Reference Expression Abstraction
Logical macros • Logical macros provide a shortcut to model a recurrent intuitive logical expression • Example: • C subClassOf R some Thingand R only D • C that R at least one thing that is only D • Carnivore subClassOf (eat someThing) and (eat only Animal) • Carnivores eat at least one thing, that is only an animal
N-ary relation / Situation Concrete scenario • Chad Smith was the drum player of Red Hot Chili Peppers when they recorded their album Stadium Arcadium from September 2004 to December 2005. • A person plays a certain role in a band during an album recording, taking place during a certain time interval • PlaySituation(person, musicianrole, band, album, timeinterval) • Quinary relation, needs adaptation to OWL • Methods: reification, reuse of a generic knowledge pattern, binary projections, identification constraint Abstracted scenario FOL formalization
N-ary relation - Situation pattern (Intensional) reification • PlaySituation ∈ owl:Class • PlaySituation ⊑ sit:Situation • personPlaying ⊑ (PlaySituation ⨉ Person) • playsRole ⊑ (PlaySituation ⨉ MusicianRole) • inBand ⊑ (PlaySituation ⨉ Band) • forAlbum ⊑ (PlaySituation ⨉ Album) • recordingTime ⊑ (PlaySituation ⨉ tim:TimeInterval) • PlaySituation hasKey[playsRole, forAlbum, inBand, recordingTime, personPlaying] Knowledge pattern specialization Binary projection Identification constraint
Situation • A general vocabulary for n-ary relations • Situation abstracts from reified n-ary relations, by defining a top-level relation for all binary projections of the n-ary relation • A way to conceives a state of affairs, a set of things, a fact • All time indexed (and place indexed) patterns we have seen so far are (in principle) specializations of Situation
Reasoning patterns • Materialize and query • Classify, fire rule, and materialize • Assert constructs, check consistency, and classify • Learn disjointness and check consistency • Make NER, populate, and classify
Examples of Reasoning ODPs • Precise • Classification • Subsumption • Inheritance • Materialization • Rule firing • Constructive query • ... • Approximate • Fuzzy classification • Information extraction (NER, RE) • Similarity induction (e.g. alignment) • Taxonomy induction • Relevance detection • Latent semantic indexing • ... • or some workflow including them, e.g. • Import ontologies and data • Transform non-RDF data • Materialize inverse relations and property chains • Run classifier+subsumer • Assert results (or merge asserted and inference graphs) • Run constructive query • Assert result graph • Run similarity engine • Transform results • Assert result graph
Reengineering patterns • Thesaurus to SKOS • Relational DB to RDF • WordNet RDB to OWL • XML to RDF • FrameNet XML to RDF • NER entities to ABox • Microformat to RDF
Correspondence patterns • Class to Property+RangeRestriction • DatatypeProperty+DatatypeRestriction to ObjectProperty+RangeRestriction • Class to Individual 53
Alignment pattern example: FOAF-VCard example by François Scharffe
Anti-patterns (1/2) • Partonomies or subject classifications as subsumption hierarchies • *City subClassOf Country • City subClassOf (partOf some Country) • *City subClassOf Geography • City narrower Geography (e.g. in SKOS) • Linguistic disjunction as class disjointness • Dead or alive • *Dead or Alive • Dead disjointWith Alive • Linguistic conjunction as class disjunction • Pen and paper • *Pen and Paper • Pen or Paper | Collection subClassOf (hasMember some Paper ; some Pen)
Anti-patterns (2/2) • Causality as entailment • Kaupthing bank behavior caused Iceland crisis • *KaupthingBankBehavior subClassOf IcelandCrisis • Expressions as instances of the class representing their meaning • *dog(word) rdf:type Dog • dog(word) expresses Dog (with punning) • Multiple domains or ranges of properties as intersection • *hasInflammation rdfs:domain Epithelium ; Endothelium • hasInflammation rdfs:domain (Epithelium or Endothelium) 39
Knowledge (aka content, domain, conceptual, semantic) patterns
Patterns in general • “Invariances across observed data or objects” • They exist in natural, social, cognitive, or abstract worlds • Mathematical pattern science is about symbols, i.e. non-interpreted information objects • Objects of knowledge engineering are interpreted (either formally or cognitively) • Mutual support/dependencies
Expertise patterns • Evidence that units of expertise are larger than what we have from average linked data, or worse, ontology learning • “Blinking” effects in reacting to events, in evaluating the actions and theories of the others, in understanding context, in interpreting news and ads, etc. • Competency questions try to convey these units as requirements • Which objects take part in a certain event? • Which tasks should be executed in order to achieve a certain goal? • What’s the function of that artifact? • Does this behaviour conform to a certain rule? • What norms are applicable to a certain case? • What norm is superordinated among these ones? • Sometimes exception conditions should be added • Task-based ontology evaluation can be performed with unit tests against ontologies trying to satisfy competency questions 42
Cognitive foundations • A search for the relevant units of meaning (not the primitives) • Agents’ understanding is based on [patterns] abstracted from previously occurred situations involving those agents, which are adapted and recombined on-the-fly in novel situations • Bartlett’s experiments on schemata (1932), Neisser (1967) • Piaget’s experiments on schemata (1954) • Fillmore’s frame semantics (1968) • Minsky’s frames (1974) • Schank’s scripts (1977) • Gibson’s affordances (1977) • Biederman’s experiments on scene recognition (1982) • Barsalou’s ad-hoc goal-oriented categories (1983) • Lakoff’s conceptual metaphors (1987), Langacker’s compositional paths (1987) • Barsalou’s simulators (1999) • Bar’s associative-analogical network of frames for anticipation (2007) • Rizzolatti, Iacoboni, Gallese, etc. results and plausibility of mirror-neuron-system-based frames (2008)
Evidence of knowledge patterns • In linguistic resources • Sentences • Sub-categorization frames • Lexico-syntactic patterns • Lexical frames • Question patterns • (Bounded sets of) selectional preferences • In data • Data patterns • Data models (xsd, rdb) • Query types and views • Microformats • Infoboxes • In interaction • Interaction patterns • Lenses • HTML templates • In semantic resources • Competency questions • n-ary relations • OWL/RDFS classes with (locally complete?) sets of restrictions or properties • KM Component Library • Content ontology design patterns (CPs) • Knowledge patterns discovered from datasets 44
Generic Content ODPs • Classification • Roles of objects • Containment • Part-whole relationships • Membership • Information and its realizations • Time and Places • Situations • Actions, moving, processes, transitions • Descriptions
Roles of objects • Objects can play different roles in different situations • Depending on the constraints given by the requirements, modeling of objects and their roles can be addressed differently • Do we want to represent properties of roles? • Do we want to classify objects based on their roles? • Do we want to assert facts about roles?
Roles of objects • A beer mug used as vase • Books used as table’s legs • A sax player (person) • A song writer (person)
Roles as classes • An object and its roles are related through the rdf:type property • rdf:type relations can be either asserted or inferred through classification • In order to automatically classify individuals in a certain class the ontology has to define appropriate axioms
Roles as classes • Consequences • Low expressivity • Roles are described at TBox level • Class taxonomy is bigger - a class for each role • Class taxonomy is entangled - multi-typing • ABox is smaller – same individual, several (role) types • Automatic classification of individuals through rdfs:subClassOf inheritance – with proper axioms • Roles cannot be indexed in terms of space and time • Facts about roles cannot be expressed e.g. “Roles in UniBo can be student, professor, researcher”, “Valentina is teacher for KMDM course” • Queries: ?x a SongWriter • General CQs • What objects have a (role) type?