220 likes | 236 Views
OWL Datatypes: Design and Implementation. Boris Motik and Ian Horrocks University of Oxford. Contents. Introduction The Datatype System of OWL 2 The Datatypes of OWL 2 A Modular Datatype Checker Conclusion. Problems with Datatypes in OWL 1.
E N D
OWL Datatypes:Design and Implementation Boris Motik and Ian HorrocksUniversity of Oxford
Contents • Introduction • The Datatype System of OWL 2 • The Datatypes of OWL 2 • A Modular Datatype Checker • Conclusion
Problems with Datatypes in OWL 1 • Datatypes of OWL 1 are based on XML Schema (XSD) • Problems with OWL 1 datatypes: • too few normative ones • no user-defined datatypes (e.g., intervals) • reasoning with some XSD datatypes is difficult • some XSD datatypes have an inappropriate semantics • there are datatype-less constants • certain semantic aspects are unclear • reasoning algorithms are unclear
Motivation • OWL 2: a new version of OWL • considerably improves the datatype system of OWL • Our results ensure that… • …the datatype system of OWL 2 is extensible • …certain language extensions are correctly defined • …OWL 2 supports datatypes that are practically feasible • …we know how to implement the datatypes of OWL 2 Make datatypes in OWL 2 better Provide guidance for implementors
Contents • Introduction • The Datatype System of OWL 2 • The Datatypes of OWL 2 • A Modular Datatype Checker • Conclusion
Datatype Map • Each datatype d is described by: • a URI – gives the name of the datatype • a set of constantsNC(d) • a set of facets pairs NF(d) • a value space(d)D • a data value(c)D2(d)D for each constant c • a facet value(f)Dµ(d)D for each facet f • Example: real • facets: <x, >x, ·x, ¸x,int • Example: str • facets: h minLength n i, h maxLength n i, h length n i,h pattern “regExp” i
Data Ranges • Facet expression: Boolean formula over facets • e.g., ¸5Æ·10 • Datatype restriction: d[] • d is a datatype and is a facet expression for d • e.g., real[ int Ƹ5Æ·10 ] • OWL 2 Syntax: DatatypeRestriction( xsd:integer xsd:minInclusive “5”^^xsd:integer xsd:maxInclusive “10”^^xsd:integer ) • Data range: >D, d[], { v1, …, vn }, dr • will be extended in OWL 2 to all Boolean connectives
Using Data Ranges in Restrictions • New datatype constructs: • qualified number restrictions • disjoint data properties • Semantics is defined w.r.t. a datatype domainMD
Openness of the Datatype Domain • MD is usually fixed in DL reasoning • datatype groups: MD is exactly the union of all value spaces • Problem: adding new datatypes can change the meaning of certain axioms • Example: >v8 U.<5t9 U.real • if real is the only datatype, then this axiom is a tautology • if we have both real and str, it is not a tautology We do not fix MD in OWL 2 • an ontology is satisfiable iff MDexists that at least contains the value spaces of all datatypes and for which all axioms are satisfied • Proposition: consequences of OWL 2 ontologies are independent of the supported set of datatypes
Naming Data Ranges • Teens´real[ intÆ >12Æ <20 ] • semantics: (Teens)D = (real[intÆ >12Æ <20 ])D • use Teens as a shortcut • e.g., Teenager´9hasAge.Teens • Problem: we can write axioms about datatypes • A´real and A´>D • fixes MD to (real)D prevents us from extending the set of datatypes Make such axioms acyclic • each data range name can be defined only once and its definition cannot refer to itself allows for simple unfolding of data range names
Datatype Reasoning • Datatype checker decides satisfiability of conjunctions over assertions dr(t) and t1¼t2 • t(i) is a variable or a constant • example: { 5 }(x1) Æint[ >4Æ <6 ](x2) Æ x1¼ x2 • Datatype checker can be integrated with a (hyper)tableau algorithm as usual • Proposition: datatype checking is NP-hard • uses data property disjointness • seems like an innocuous feature! even small additions to the language add complexity
Contents • Introduction • The Datatype System of OWL 2 • The Datatypes of OWL 2 • A Modular Datatype Checker • Conclusion
Numeric Datatypes • The following ontology is unsatisfiable: • >v8hasWeight.xsd:double • hasWeight(Paul, “76”^^xsd:integer) in XSD, the integer 76 is not contained in xsd:double no notion of typecasts in OWL • XML Schema does not have real numbers OWL 2 redefines XSD numeric datatypes • owl:realPlus = owl:real [ { -0, +inf, -inf, NaN } • owl:real is the set of all real numbers • all XSD numeric datatypes are subsets of owl:real • facets: • minExclusive, maxExclusive, minInclusive, maxInclusive
String Datatypes • Plain RDF literals with a language tag do not belong to any XSD datatype • “datatype”@en vs. “Datentyp”@de OWL 2 uses a new rdf:text datatype • value space contains pairs h string, languageTag i • will be used in RIF as well xsd:string was retrofitted to rdf:text • value space contains pairs h string, “” i The set of characters is assume to be infinite • E.g., ¸ n U.(str[ length 1])(a) is satisfiable iff n · m, where m is the number characters • m will change in future, which could change the meaning of this axiom
Other Datatypes • Date/time: • many XSD date/time datatypes are difficult to reason with • e.g., xsd:gMonthDay represents a recurring point in time but recurrences are irregular due to leap seconds and years • XSD supports dates without time zones OWL 2 supports only xsd:dateTime with required time zone • facets: minExclusive, maxExclusive, minInclusive, maxInclusive • xsd:boolean • xsd:hexBinary and xsd:base64Binary • xsd:anyURI • disjoint with xsd:string
Contents • Introduction • The Datatype System of OWL 2 • The Datatypes of OWL 2 • A Modular Datatype Checker • Conclusion
Modular Datatype Checking • We assume that all datatypes are disjoint • xsd:integer is understood as a facet of owl:real provides us with a natural modularization boundary • Each datatype d needs a datatype handler: • mincd(d[], n) • true iff (d[])D contains at least n elements • enud(d[]) • defined only if (d[])D is finite • enumerates the extension of d[] • ind(c, d[]) • true iff cD2 (d[])D • eqd(c1, c2) • true iff c1D = c2D
The Algorithm • Input: a conjunction of assertions • Output: true iff the conjunction is satisfiable • Normalize such that each variable x in it occurs in exactly one assertion d[](x) • Simplify • delete from assertions containing certain variables • in all remaining assertions of the form d[](x), the data range d[] is finite • Replace d[](x) with D(x) for D = enud(d[]) • Guess values for all variables • Check whether the guess satisfies Can bereducedto SAT
The Simplification Step • If contains a variable x such that • x occurs in in exactly one assertion d[](x), • x occurs in in m assertions of the form x¼x’, • x occurs in in n assertions of the form x¼c, and • mincd(d[], m+n+1) = true then delete in all assertions containing x If | (d[])D | ¸ m+n+1, then we can satisfy x for any choice of values for x’ • the constraints on x are irrelevant for the satisfiability of • Key to practical reasoning: • data ranges in practice are likely to be large (even infinite)
Handling Numbers and Strings • Numbers: • represent facets as intervals of the form dt(low, high) facet expressions can be normalized using a suitable interval algebra • Strings: • represent facets as regular languages facet expressions can be normalized using standard results for Boolean operations with regular languages • caveat: the underlying alphabet is infinite need to adapt Boolean operations on regular languages • In both cases, datatype handlers are easily implemented for normalized expressions
Contents • Introduction • The Datatype System of OWL 2 • The Datatypes of OWL 2 • A Modular Datatype Checker • Conclusion
Conclusion • The algorithm has been implemented in the HermiT reasoner • a new OWL 2 reasoner based on hypertableau • http://www.hermit-reasoner.com/ • No formal evaluation yet, but… • Supporting datatypes did not noticeably change classification times • data ranges used in practice are often “large enough”