Querying Large-Scale Ontologies

Querying Large-Scale Ontologies Ralf Möller Hamburg University of Technology Institute for Software Systems (STS) Joint work with Sebastian Wandelt, Michael Wessel

Ontologies • Ontology • In principle: Conceptualization + KB • Formally: Signature + KB • KB = Tbox + Abox • Axioms + facts • Recently: • Distinction between axioms and facts blurred (nominals)

Aboxes: Where it all started query Q (GCQ) answers to Q DL-System • ALNHf: CLASSIC (90-96) • SHIQ: RacerPro (99-today), Pellet (04-today) • Sytems (constantly improved) • Pellet 2.0 (SROIQ/OWL 2) • RacerPro 2.0 (SHIQ(D)-)

Query answering for expressive DLs • Grounded conjunctive queries • ans(X) :- P1(X1), … Pn(Xn), variables bound to named objectsi (no UNA) • RacerPro, Pellet, KAON2 • Extended grounded conjunctive queries • ans(X) :- Atoms, vars bound to named objects • Atoms ::= Atom, … , Atom • Atom ::= P(X) | ∏(Y):Atoms | \ Atom • ans(X) :- mother(X), \ ∏(X) : has-child(X,Y) • RacerPro [Wessel et al. 06/07][Calvanese et al. 07]

Query answering for expressive DLs • (Unions of) conjunctive queries • ans(X) :- P1(X1), … Pn(Xn) v … v …, vars bound to any object • Variables in Xi not occurring in X are existentially quantified • QuOnto (for DL-Lite+UNA, see below) • Decidability results for SHIQ/SHOQ [Glimm et al. 07/08/09] • Discouraging complexity results • Decidability w.r.t. SROIQ and transitive (or non-simple) roles in query is an open problem

Commercial Interest • Boost in activities for providing query engines • Provide implementations fast (reduced expressivity, maybe incomplete, hopefully not unsound, not considered here) • Exploitation of complexity results (reduced expressivity) • Provide scalability for typical-case inputs even for expressive languages

Goal of this presentation • Provide a feeling about state of the art • Introduce main ideas of current approaches • Give pointers to the literature • Pinpoint opportunities for further research • Assume familiarity with DLs

DLs on a slide Fragment of first-order logic Open-world assumption No unique name assumption Axiomsare noconstraints Aboxes are no DBs

Query answering Schema (Tbox) matters for query answering Scalability is an issue Lots of research efforts (also) due to commercial relevance

My hypothesis • We can be expressive while retaining typical-case scalability • History is instructive • From CLASSIC to FaCT, Pellet, RacerPro • All results matter • Databases / query answering • Reasoning algorithms for expressive languages • Tractability results / retrieval algorithms

DL-Lite • DL-Lite family of description logics • Univ. Roma DIS, Univ. Bolzano, Univ. London [2004-today] • Expressive enough for large parts of UML • DL-LiteCore/F/A/R/(RN): UNA, No disjunction, No qualified existentials • Largest DL fragment for which UCQ query answering is in LOGSPACE/AC0, i.e. tractable [Calvanese et al. 07] • Systems: Quonto [Acciarri et al. 05] • Complexity results: [Artale et al. 09]

Quonto: The picture http://www.dis.uniroma1.it/quonto/ Taken from a presentation by Riccardo Rosati

Example Taken from a presentation by Riccardo Rosati

Example (contnd) Taken from a presentation by Riccardo Rosati

Query rewriting algorithm for DL-Lite (simplified) Taken from a presentation by Riccardo Rosati

Problems Relies on UNA Query rewriting might cause exponential blowup Relies on SQL query optimizer

DB Mappings (MASTRO) Taken from a presentation by Riccardo Rosati

Data Integration (MASTRO-I) Taken from a presentation by Riccardo Rosati

Ok… … so reducing the expressiveness is the key? Probably not in all cases Expressive Tboxes required for problem solving A lot of shouting ermerged around 2005: “Current reasoners are assumed not to scale w.r.t. Abox reasoning…”

Some measurements with Optimizations in RacerPro 2.0 LUBM-Lite [Haarslev and Möller 08]

LUBM Characteristics LUBM [Heflin et al. 05]

… with disjunctions LUBM [Haarslev and Möller 08]

… as well as disjunctions and same-as (UOB) UOBM [Ma et al 06] [Haarslev and Möller 08]

Disjunctive Datalog/KAON2 Karlsruhe & Manchester[Motik and Sattler 06] Diagram taken from a presentation by Markus Krötzsch

Problems Exponential blowups possible in transformation steps (in particular for number restrictions) Query-neutral transformation but magic-set transformation required No domains such as strings, reals, etc. supported

New ideas query Q (GCQ) answers to Q DL-System Summarizer Store Abox in database/triple store Provide “excerpt” that fits into main memory, and provides an index to the right individuals Role condensates [Wandelt 07] Summary Abox [Fokoue et al. 06]

Summary Abox [Fokoue et al. 06] Aggregate individuals of the same type as long as Abox remains consistent: yields summary Abox’ Instance test on summary Abox’ gives non-instances Refinement step to make Abox’ more precise using justifications gives an instance test

Example Legend: C – Course P - Person M - Man W – Woman H - Hobby Summary Original ABox C’{C1, C2} C2 C’ C1 isTaughtBy isTaughtBy isTaughtBy isTaughtBy isTaughtBy isTaughtBy M2 P2 M1 M’ P’ P1 likes likes likes TBox: Functional (isTaughtBy) Disjoint (Man, Woman) H2 H’ H1 Taken from a presentation by A. Fokoue, IBM

Resolving inconsistencies Legend: C – Course P - Person M - Man W – Woman H - Hobby Original ABox Summary Summary is inconsistent C’{C1, C2, C3} C’ C2 C3 C1 isTaughtBy isTaughtBy isTaughtBy isTaughtBy isTaughtBy isTaughtBy isTaughtBy isTaughtBy isTaughtBy M’ P’ W’ M2 P2 M1 P1 P3 W1 likes likes likes TBox: Functional (isTaughtBy) Disjoint (Man, Woman) H1 H2 H’ Taken from a presentation by A. Fokoue, IBM

Refinement & Justifications After 1st Refinement After 2nd Refinement – Consistent Summary Cx’ Not(Q) Cy’ Not(Q) Cx’ Cy’ Cx’{C1, C2} Cy’{C3} isTaughtBy isTaughtBy isTaughtBy isTaughtBy isTaughtBy isTaughtBy isTaughtBy Px’{P1, P2} Py’{P3} M’ W’ Px’ Px’ M’ P’ W’ Py’ Not(Q) Not(Q) likes likes Summary still inconsistent! Solns: P1, P2 Sample Q: PeopleWithHobby? H’ H’ Not(Q) PeopleWithHobby = (some likes Hobby) Taken from a presentation by A. Fokoue, IBM

SHER [Fokoue 07/08] Taken from a presentation by A. Fokoue, IBM Based on Abox summaries Refinement strategy is based on identifying justifications for an inconsistency in the summary

Problems with summaries • UNA? • Datatypes? • LUBM/UOBM: What if all persons have a name, and only one name? • Summaries will be too large • Updates? • Approach patented?

Idea: Try to define Abox modules • Due to Tbox, individuals in Aboxes • are potentially merged • are “annotated” with additional concepts • due to value restrictions triggered by role assertions • due to axioms • What if we assume the worst case? • If sth that will potentially be propagated to an individual is already known, the role corresponding assertion can be “removed”

Island reasoning [Wandelt 08] • Static analysis of Tbox • Role info structure • Interesting also for DL-Lite-based approaches (simpler joins)

Example ontology

ABox assertions (small excerpt) Department • Prof(ralf) • Prof(sibylle) • headOf(sybille, sts) • Department(sts) • worksFor(rainer, sts) • takesCourse(amy, se) • Course(se) • teaches(ralf, se) • hasFriend(amy, luis) • hasAuthor(pub1, ralf) • hasAuthor(pub1, sybille) • Publication(pub1) • Person(luis) sts Prof Prof sybille ralf pub1 Course db Publication amy luis Person

Instance Checking Optimization • Assume a query: KB|= Chair(sybille) • Looking at our ontology, the important Tbox axiom is • which is ”equivalent” to a disjunction: Informally speaking, the headOf-role can be used to propagate ”negative” Department-information to successor-nodes of sybille. Important to note: all the explicit headOf-successors of sybille are Departments already! Thus we will only propagate ”obvious” information for this particular assertion.

Instance Checking Optimization • Taking all axioms in the TBox into account, we obtain the following subgraph of the ABox relevant for reasoning about sybille: Department sts Prof Prof syb. ralf pub1 This subset of our ABox suffices to show KB |= Chair(sybille) db Course Publication amy luis Person

Instance Checking Optimization • To sum up [Wandelt 08]: • Axioms can be analyzed offline • Depending on these forall-constraints we can determine on the fly a (usually small) subset of ABox assertions relevant for a given individual • Partition structure can be incrementally (re)computed[Wandelt 09]

Analysis (LUBM)

Query answering • Example: Find all named individuals X, such that KB|= Chair(X) • The naive approach is to perform instance checkings for all named individuals in the ABox • => We have 7 named individuals, only sybille turns out to be a Chair • The question is: Given an ABox instance query, can we apply some preselection techniques to filter obvious solutions and obvious non-solutions to reduce the number of instance checkings? The answer is: sometimes

Instance Retrieval Optimization • Try to find an approximzation of the ontology in a ”weaker” DL-language, which enables unsound + complete + more efficient reasoning • Approximation has been dealt with before(e.g., [Hitzler et al. 08], [Pan et al. 07/09]) • Here: conversion to DL-Lite

Instance Retrieval Optimization • The only critical (impossible to rewrite in an equivalence preserving way) axiom is • We could replace it by: • What happens?

Instance Retrieval Optimization • The resulting DL-Lite ontology is still consistent (which is important for reasoning) • Now all Professors become instances of Chair • An instance retrieval query on the DL-Lite ontology yields the candidate individuals ralf and sybille • Only two (more costly) instance checking tests left to remove unsound answers (here: ralf)

The complete picture… (candidates) ‘ Compressor CandidateEliminator Approximation Partitioning fewer candidates answers to Q DL-System [Wandelt, Wessel et al 08] Partitions

Obstacles Approximation can make the Tbox unsatisfiable There can be more than one approximation: Which one to select? Partitions can be too large to fit in main memory Approximation can lead to too many candidates

Optimizations Abox summary or role condensates to determine obvious non-instances

Role condensates [Wandelt 2007] Goal: Support worst-case analyses in main memory KB |= R(a,b) and KB |= R(a,c) then merge b and c Extension: consider role hierarchies

Example: Before condensation

Example: After condensation Condensation is fast and can be made incremental

Querying Large-Scale Ontologies

Querying Large-Scale Ontologies

Presentation Transcript

Large Scale Weather

Large Scale Structure

Towards Portable Controlled Natural Languages for Querying Ontologies

LARGE - SCALE ASSESSMENTS

large scale Refactoring

Large scale control

Large-scale matching

LARGE SCALE

Large- scale Organisations

LARGE SCALE ORGANISATIONS

Large scale

Large-Scale Systems

Large Scale Organisations.

Large Scale Sharing

Large Scale Sharing

Large Scale Projects

Large Scale Operations

Large Scale Applications

Large Scale Pilot

HAO/SCD: VO, metadata, catalogs, ontologies, querying

Large Scale Drupal

Large-scale solar