230 likes | 324 Views
Querying the Web of Data: a Formal Approach. Paolo Bouquet 1 , Chiara Ghidini 2 & Luciano Serafini 2 1 University of Trento (Trento, Italy) – 2 FBK-IRST (Trento, Italy) ASWC – Shanghai (China) – 7th December 2009. An unending set of databases. The Semantic Web is about two things.
E N D
Querying the Web of Data:a Formal Approach • Paolo Bouquet1, Chiara Ghidini2 & Luciano Serafini2 • 1 University of Trento (Trento, Italy) – 2 FBK-IRST (Trento, Italy) • ASWC – Shanghai (China) – 7th December 2009
An unending set of databases The Semantic Web is about two things. It is about common formats for integration and combination of data drawn from diverse sources, where on the original Web mainly concentrated on the interchange of documents. It is also about a language for recording how the data relates to real world objects. That allows a person, or a machine, to start off in one database, and then move through an unending set of databases which are connected not by wires but by being about the same thing [From http://www.w3.org/2001/sw/, old web site]
Goals of the paper Answering two questions: • What is an appropriate formal model for this unending database (Web of Data)? A general model (called Graph Space), based on the framework of Distributed First Order Logic • How can we use this model to formalize different ways of querying the Graph Space? Formal definition of three different ways of querying the Web of Data as a Graph Space
The standard RDF semantics • Interpretation: <,,>, where: • is a set of objects (domain) • is a function from symbols to individuals of • is a function from individuals to pairs of individuals (properties) • An interpretation m is a Model of a graph giffm satisfies all triples in g • A triple (a.b.c) (g) is a Logical consequence of giff for any model m of g there is an assignment a such that • m |= g m |= (a.b.c)[a] • A query answer on g is a set of triples which are logical consequence of g and conform to a given query pattern
Extension to multiple RDG graphs Graphs are merged by renaming blank nodes (to avoid clashes) [see def. of merging in RDF Semantics] The concepts above are extended to the merged graph Queries are defined as before for the resulting merged graph
Problems with this approach • It does not scale to the global Web of Data • Issues in handling inconsistent datasets • Information about context/viewpoint is lost • No clear management of provenance • No clear theory of identity between objects across graphs (e.g. global effect of owl:sameAs statements)
Our proposal: the Graph Space • Agraph space G composed of a family of RDF graphs g1, … gn. • Formally: • Let I be a set of URIs, and gi be an RDF graph • A graph space G on I is G={(gi)}{i in I} • [Note: In our model (unlike Named Graphs), the URI for a graph is just for access and does not refer to the graph itself] g2 g1 g3
The Graph Space - II • Idea: The semantics of G = {g1, … , gn} is given in terms of suitable compositions of the (standard) semantics of the component graphs g1, … , gn. g2 g1 g3 m1 m2 m3 Local model of g2 Local model of g3 Local model of g1 Domain relation r12 from 1 to 2
The role of identity in integration • Graphs are merged by collapsing nodes and arcs which name the same resource (by its URI) • So identity is the glue of the Web of Data, in three forms: • storing explicit identity statements in RDF graphs (typically through owl:sameAs) Ex: URI1owl:sameAs URI2 • reuse (mention) of a URI i:u in a graph with a different namespace j Ex: using http://www.w3.org/People/Berners-Lee/card#i as a URI for Tim Berners-Lee in my FOAF profile at http://dit.unitn.it/~bouquet/foaf.rdf In both cases, we name in a graph a URI from another graph to create a link between the two graphs • How do we model these identity-based links?
Modeling relation across graphs via identity • Local interpretation of owl:sameAs statements • Links as domain relations • Domain relation: in the model, (d,d’) rijmeans that d and d’ represent the same object from j’s point of view • Asymmetry: (d,d’) rij does not entail (d’,d) rij g2 g1 x owl:sameAs g2:y y m1 m2 r21 d d’ Local model of g2 Local model of g1
The model for Graph Spaces • Interpretation M of Graph Space G: <miiI,riji,jI>, where: • mi is an RDF interpretation of graph gi • rij i jis a domain relation mapping objects from gi’s to gj’s domain • An interpretation M is a Model of a graph space G (M |= G) if mi |= gi • A triple i : (a.b.c) (G) is a Logical consequence of Giff for any model M of G it is the case that • mi|= (a.b.c)
Propagation of information in Graph Spaces • In a Graph Space, properties stated in a graph propagate to other graphs via the domain relation: (d,d’), (e,e’), (f,f’) rijand(d,e) (f) then (d’,e’) (f’)
Part II Querying Graph spaces
Three Modes for Querying a Graph Space We identified three procedural strategies for answering a query on a Graph Space: • The Bounded Mode: pre-define the RDF datasets which will be used to answer the query (e.g. using FROM in a SPARQL query) • The Navigational Mode: navigate the links which connect the initial graph with the other graphs in the Graph Space and use the reachable graphs as RDF datasets to answer the query (e.g. Linked Data) • The Direct AccessMode: get from an oracle the relevant graphs and use them to answer the query (search based) For each of them we define the corresponding formal model based on a restriction of the generic model
Bounded mode: intuition To answer a bounded query: Select the set J of graphs named in the query Merge them Query the resulting merged graph J
A model for the bounded mode Bounded model. M is a J-bounded model, for any set JI, if it is a model for G and for all i, jI • if i, jJ then inM the same URIs denote the same real world objects (via domain relation) • if jJ and iJ, then rij = rji = gi gj k:x k:x k:x mi mj mh d d’ d’’ rji J Local model of g2 Local model of g3
Navigational mode: intuition i* To answer a navigational query from a graph i: Select all graphs “reachable” from i Merge them Query the resulting merged graph i
A model for the navigational mode Navigational model. M is a navigational model if it is a model for G and for all ji* ((j : x)Ij, (j : x)Ii) rji [if j was reached from i, then (the interpretation of) j:x in the two graphs must be the same from i’s viewpoint] i i i j j:x j:x k:x mi mj mh d’ d d’’ rji i* Local model of g2 Local model of g3
Direct access mode: intuition To answer a direct access query: “Search” relevant graphs Merge them Query the resulting merged graph
A model for the direct access mode Direct access model. M is a direct access model if it is a model for G and for all i I and for all j:x ((j : x)Ij, (j : x)Ii) rji i i i j j:x j:x k:x mi mj rji mh d’ d d’’ rji I Local model of g2 Local model of g3
Discussion & Future Work There seems to be a tension between navigation and access: • Navigation requires to multiply URIs and create a massive number of links to connect content • Direct access would be much easier if content provider decided to use the same URI for the same thing in any dataset Implementing query on the Web of Data seems to require two types of URIs: • URIs for accessing information about an object (by dereferincing them) • URIs as pure identifiers for referencing directly to real world objects
The idea of OKKAM ids and the ENS The OKKAM project is providing an open large-scale infrastructure (called Entity Name System or ENS for short) which provides interfaces (APIs) for: • Creating and retrieving permanent and neutral URIs for direct reference to objects (description-less) • Storing mappings to standard RDF URIs for the same object URI2 URI1 URI3 sameAs refersTo refersTo refersTo Okkam Id