1 / 60

Querying Large-Scale Ontologies

Querying Large-Scale Ontologies. Ralf Möller Hamburg University of Technology Institute for Software Systems (STS) Joint work with Sebastian Wandelt, Michael Wessel. Ontologies. Ontology In principle: Conceptualization + KB Formally: Signature + KB KB = Tbox + Abox Axioms + facts

tod
Download Presentation

Querying Large-Scale Ontologies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Querying Large-Scale Ontologies Ralf Möller Hamburg University of Technology Institute for Software Systems (STS) Joint work with Sebastian Wandelt, Michael Wessel

  2. Ontologies • Ontology • In principle: Conceptualization + KB • Formally: Signature + KB • KB = Tbox + Abox • Axioms + facts • Recently: • Distinction between axioms and facts blurred (nominals)

  3. Aboxes: Where it all started query Q (GCQ) answers to Q DL-System • ALNHf: CLASSIC (90-96) • SHIQ: RacerPro (99-today), Pellet (04-today) • Sytems (constantly improved) • Pellet 2.0 (SROIQ/OWL 2) • RacerPro 2.0 (SHIQ(D)-)

  4. Query answering for expressive DLs • Grounded conjunctive queries • ans(X) :- P1(X1), … Pn(Xn), variables bound to named objectsi (no UNA) • RacerPro, Pellet, KAON2 • Extended grounded conjunctive queries • ans(X) :- Atoms, vars bound to named objects • Atoms ::= Atom, … , Atom • Atom ::= P(X) | ∏(Y):Atoms | \ Atom • ans(X) :- mother(X), \ ∏(X) : has-child(X,Y) • RacerPro [Wessel et al. 06/07][Calvanese et al. 07]

  5. Query answering for expressive DLs • (Unions of) conjunctive queries • ans(X) :- P1(X1), … Pn(Xn) v … v …, vars bound to any object • Variables in Xi not occurring in X are existentially quantified • QuOnto (for DL-Lite+UNA, see below) • Decidability results for SHIQ/SHOQ [Glimm et al. 07/08/09] • Discouraging complexity results • Decidability w.r.t. SROIQ and transitive (or non-simple) roles in query is an open problem

  6. Commercial Interest • Boost in activities for providing query engines • Provide implementations fast (reduced expressivity, maybe incomplete, hopefully not unsound, not considered here) • Exploitation of complexity results (reduced expressivity) • Provide scalability for typical-case inputs even for expressive languages

  7. Goal of this presentation • Provide a feeling about state of the art • Introduce main ideas of current approaches • Give pointers to the literature • Pinpoint opportunities for further research • Assume familiarity with DLs

  8. DLs on a slide Fragment of first-order logic Open-world assumption No unique name assumption Axiomsare noconstraints Aboxes are no DBs

  9. Query answering Schema (Tbox) matters for query answering Scalability is an issue Lots of research efforts (also) due to commercial relevance

  10. My hypothesis • We can be expressive while retaining typical-case scalability • History is instructive • From CLASSIC to FaCT, Pellet, RacerPro • All results matter • Databases / query answering • Reasoning algorithms for expressive languages • Tractability results / retrieval algorithms

  11. DL-Lite • DL-Lite family of description logics • Univ. Roma DIS, Univ. Bolzano, Univ. London [2004-today] • Expressive enough for large parts of UML • DL-LiteCore/F/A/R/(RN): UNA, No disjunction, No qualified existentials • Largest DL fragment for which UCQ query answering is in LOGSPACE/AC0, i.e. tractable [Calvanese et al. 07] • Systems: Quonto [Acciarri et al. 05] • Complexity results: [Artale et al. 09]

  12. Quonto: The picture http://www.dis.uniroma1.it/quonto/ Taken from a presentation by Riccardo Rosati

  13. Example Taken from a presentation by Riccardo Rosati

  14. Example (contnd) Taken from a presentation by Riccardo Rosati

  15. Query rewriting algorithm for DL-Lite (simplified) Taken from a presentation by Riccardo Rosati

  16. Problems Relies on UNA Query rewriting might cause exponential blowup Relies on SQL query optimizer

  17. DB Mappings (MASTRO) Taken from a presentation by Riccardo Rosati

  18. Data Integration (MASTRO-I) Taken from a presentation by Riccardo Rosati

  19. Ok… … so reducing the expressiveness is the key? Probably not in all cases Expressive Tboxes required for problem solving A lot of shouting ermerged around 2005: “Current reasoners are assumed not to scale w.r.t. Abox reasoning…”

  20. Some measurements with Optimizations in RacerPro 2.0 LUBM-Lite [Haarslev and Möller 08]

  21. LUBM Characteristics LUBM [Heflin et al. 05]

  22. … with disjunctions LUBM [Haarslev and Möller 08]

  23. … as well as disjunctions and same-as (UOB) UOBM [Ma et al 06] [Haarslev and Möller 08]

  24. Disjunctive Datalog/KAON2 Karlsruhe & Manchester[Motik and Sattler 06] Diagram taken from a presentation by Markus Krötzsch

  25. Problems Exponential blowups possible in transformation steps (in particular for number restrictions) Query-neutral transformation but magic-set transformation required No domains such as strings, reals, etc. supported

  26. New ideas query Q (GCQ) answers to Q DL-System Summarizer Store Abox in database/triple store Provide “excerpt” that fits into main memory, and provides an index to the right individuals Role condensates [Wandelt 07] Summary Abox [Fokoue et al. 06]

  27. Summary Abox [Fokoue et al. 06] Aggregate individuals of the same type as long as Abox remains consistent: yields summary Abox’ Instance test on summary Abox’ gives non-instances Refinement step to make Abox’ more precise using justifications gives an instance test

  28. Example Legend: C – Course P - Person M - Man W – Woman H - Hobby Summary Original ABox C’{C1, C2} C2 C’ C1 isTaughtBy isTaughtBy isTaughtBy isTaughtBy isTaughtBy isTaughtBy M2 P2 M1 M’ P’ P1 likes likes likes TBox: Functional (isTaughtBy) Disjoint (Man, Woman) H2 H’ H1 Taken from a presentation by A. Fokoue, IBM

  29. Resolving inconsistencies Legend: C – Course P - Person M - Man W – Woman H - Hobby Original ABox Summary Summary is inconsistent C’{C1, C2, C3} C’ C2 C3 C1 isTaughtBy isTaughtBy isTaughtBy isTaughtBy isTaughtBy isTaughtBy isTaughtBy isTaughtBy isTaughtBy M’ P’ W’ M2 P2 M1 P1 P3 W1 likes likes likes TBox: Functional (isTaughtBy) Disjoint (Man, Woman) H1 H2 H’ Taken from a presentation by A. Fokoue, IBM

  30. Refinement & Justifications After 1st Refinement After 2nd Refinement – Consistent Summary Cx’ Not(Q) Cy’ Not(Q) Cx’ Cy’ Cx’{C1, C2} Cy’{C3} isTaughtBy isTaughtBy isTaughtBy isTaughtBy isTaughtBy isTaughtBy isTaughtBy Px’{P1, P2} Py’{P3} M’ W’ Px’ Px’ M’ P’ W’ Py’ Not(Q) Not(Q) likes likes Summary still inconsistent! Solns: P1, P2 Sample Q: PeopleWithHobby? H’ H’ Not(Q) PeopleWithHobby = (some likes Hobby) Taken from a presentation by A. Fokoue, IBM

  31. SHER [Fokoue 07/08] Taken from a presentation by A. Fokoue, IBM Based on Abox summaries Refinement strategy is based on identifying justifications for an inconsistency in the summary

  32. Problems with summaries • UNA? • Datatypes? • LUBM/UOBM: What if all persons have a name, and only one name? • Summaries will be too large • Updates? • Approach patented?

  33. Idea: Try to define Abox modules • Due to Tbox, individuals in Aboxes • are potentially merged • are “annotated” with additional concepts • due to value restrictions triggered by role assertions • due to axioms • What if we assume the worst case? • If sth that will potentially be propagated to an individual is already known, the role corresponding assertion can be “removed”

  34. Island reasoning [Wandelt 08] • Static analysis of Tbox • Role info structure • Interesting also for DL-Lite-based approaches (simpler joins)

  35. Example ontology

  36. ABox assertions (small excerpt) Department • Prof(ralf) • Prof(sibylle) • headOf(sybille, sts) • Department(sts) • worksFor(rainer, sts) • takesCourse(amy, se) • Course(se) • teaches(ralf, se) • hasFriend(amy, luis) • hasAuthor(pub1, ralf) • hasAuthor(pub1, sybille) • Publication(pub1) • Person(luis) sts Prof Prof sybille ralf pub1 Course db Publication amy luis Person

  37. Instance Checking Optimization • Assume a query: KB|= Chair(sybille) • Looking at our ontology, the important Tbox axiom is • which is ”equivalent” to a disjunction: Informally speaking, the headOf-role can be used to propagate ”negative” Department-information to successor-nodes of sybille. Important to note: all the explicit headOf-successors of sybille are Departments already! Thus we will only propagate ”obvious” information for this particular assertion.

  38. Instance Checking Optimization • Taking all axioms in the TBox into account, we obtain the following subgraph of the ABox relevant for reasoning about sybille: Department sts Prof Prof syb. ralf pub1 This subset of our ABox suffices to show KB |= Chair(sybille) db Course Publication amy luis Person

  39. Instance Checking Optimization • To sum up [Wandelt 08]: • Axioms can be analyzed offline • Depending on these forall-constraints we can determine on the fly a (usually small) subset of ABox assertions relevant for a given individual • Partition structure can be incrementally (re)computed[Wandelt 09]

  40. Analysis (LUBM)

  41. Query answering • Example: Find all named individuals X, such that KB|= Chair(X) • The naive approach is to perform instance checkings for all named individuals in the ABox • => We have 7 named individuals, only sybille turns out to be a Chair • The question is: Given an ABox instance query, can we apply some preselection techniques to filter obvious solutions and obvious non-solutions to reduce the number of instance checkings? The answer is: sometimes

  42. Instance Retrieval Optimization • Try to find an approximzation of the ontology in a ”weaker” DL-language, which enables unsound + complete + more efficient reasoning • Approximation has been dealt with before(e.g., [Hitzler et al. 08], [Pan et al. 07/09]) • Here: conversion to DL-Lite

  43. Instance Retrieval Optimization • The only critical (impossible to rewrite in an equivalence preserving way) axiom is • We could replace it by: • What happens?

  44. Instance Retrieval Optimization • The resulting DL-Lite ontology is still consistent (which is important for reasoning) • Now all Professors become instances of Chair • An instance retrieval query on the DL-Lite ontology yields the candidate individuals ralf and sybille • Only two (more costly) instance checking tests left to remove unsound answers (here: ralf)

  45. The complete picture… (candidates) ‘ Compressor CandidateEliminator Approximation Partitioning fewer candidates answers to Q DL-System [Wandelt, Wessel et al 08] Partitions

  46. Obstacles Approximation can make the Tbox unsatisfiable There can be more than one approximation: Which one to select? Partitions can be too large to fit in main memory Approximation can lead to too many candidates

  47. Optimizations Abox summary or role condensates to determine obvious non-instances

  48. Role condensates [Wandelt 2007] Goal: Support worst-case analyses in main memory KB |= R(a,b) and KB |= R(a,c) then merge b and c Extension: consider role hierarchies

  49. Example: Before condensation

  50. Example: After condensation Condensation is fast and can be made incremental

More Related