290 likes | 453 Views
SPARQL Query Rewriting for Implementing Data Integration over Linked Data. Gianluca Correndo, Manuel Salvadores, Ian Millard, Hugh Glaser, Nigel Shadbolt. Linked Data access. Retrieving RDF content via HTTP requests Instance based vs. schema based access Accessing SPARQL endpoints
E N D
SPARQL Query Rewriting for Implementing Data Integration over Linked Data Gianluca Correndo, Manuel Salvadores, Ian Millard, Hugh Glaser, Nigel Shadbolt
Linked Data access • Retrieving RDF content via HTTP requests • Instance based vs. schema based access • Accessing SPARQL endpoints • Schema based vs. instance based access SPARQL+HTTP
Linked Data – Schema based integration (SPARQL) Query Ontology Co-reference Data set source target OA = <SO,TO,TD,EA> SO: Source Ontologies TO: Target Ontologies TD: Target Dataset EA: Entity Alignments • Datasets can use more than one ontology for describing the data • More than one dataset can use the same set of ontologies coherently (e.g. RKB) • More than one ontology is used for defining a SPARQL query • Ontologies contain many entities to be aligned
Query Rewriting Architecture <source> SPARQL query SPARQL query rewriter <target> SPARQL query <KISTI> SPARQL query <dbpedia> SPARQL query Alignments voiD
Ontology Alignment • DL primitives are used to describe concept alignments (i.e. Equivalent, Subsume) • Implementation of the underneath ontological mediation usually not provided or relies on reasoners • Ontological mediation usually applied to data, not queries • rule systems that exploit alignments to translate data • [Euzenat] SPARQL for integrating dataCONSTRUCT { ?x rdf:type vc:VCard } WHERE { ?x rdf:type foaf:Person }How to write such queries?
Anatomy of a SPARQL query • Query type: SELECT, DESCRIBE, CONSTRUCT, ASK • Basic Graph Pattern (or BGP): graph pattern that resulting triples must satisfy • Filter section: additional constraints over variables present in the BGPPREFIX id:<http://southampton.rkbexplorer.com/id/>PREFIX akt:<http://www.aktors.org/ontology/portal#>SELECT DISTINCT ?a WHERE { ?paper akt:has-author id:person-02686 . ?paper akt:has-author ?a .}
SPARQL BGP ?paper PREFIX id:<http://southampton.rkbexplorer.com/id/>PREFIX akt:<http://www.aktors.org/ontology/portal#>SELECT DISTINCT ?a WHERE { ?paper akt:has-author id:person-02686 , ?a .} • “DISTINCT ?a” is not represented in this graph • Constraints over nodes can be represented either as a graph and within FILTER section akt:has-author akt:has-author ?a id:person-02686
Entity Alignment as Graph Rewriting • Query rewriting based on BGP graph rewriting • Entity Alignment EA = <LHS, RHS, FD> • LHS : Triple to match (open variables to bind) • RHS : Set of triples to instantiate (depending on previous bindings on open variables) • FD : Functional dependencies (between variables)
Entity Alignment as Graph Rewriting • Using the graph rewriting formalism we can rewrite queries defined for a dataset (or ontology) to integrate results from other data sets • But not only, we can also generate CONSTRUCT queries to integrate entire data sets
SPARQL Rewriting • Each triple from the BGP is matched to the LHSs (generating variable bindings in the process) • Eventual functional dependencies are solved (enriching the bindings with new associations) • The respective RHS is instantiated with the given bindings and replace the original triple • Unbounded variables generates new variables
SPARQL Rewriting • Example: • LHS1 = <_:1,rdf:type, source:A> • RHS1 = {<_:1,rdf:type,target:B>} • FD1 = {} • <?p,rdf:type,source:A> = LHS1[_:1/?p] • RHS1[_:1/?p]=<?p,rdf:type,target:B> • _:1 it’s the RDF way to define blank nodes, that are treated, within a graph, as existentially quantified variables.Triple(v1,rdf:type,source:A)Triple(v1,rdf:type,target:B)
Ontology Alignments – Class Eq. • SELECT *WHERE { ?s a source:User.…}<_:1,rdf:type,source:User> • SELECT *WHERE { ?s a target:Agent.…}<_:1,rdf:type,target:Agent> source:User target:Agent rdf:type rdf:type _:1 _:1
Ontology Alignments – Class Partition • SELECT *WHERE { ?s a source:WhiteWine.…}<_:1,rdf:type,source:WhiteWine> • SELECT *WHERE { ?s a target:Vin; target:has-color ”blanc”@fr…}<_:1,rdf:type,target:Vin><_:1,target:has-color, ”blanc”@fr> source:WhiteWine target:Vin rdf:type rdf:type _:1 target:has-color _:1 “blanc”@fr
Ontology Alignments – Property Eq. • SELECT *WHERE { ?s source:has-name ?n.…}<_:1,source:has-name,_:2> • SELECT *WHERE { ?s target:fullName ?n.…}<_:1,target:fullName,_:2> _:2 _:2 source:has-name target:fullName _:1 _:1
Ontology Alignments – Property Eq. • SELECT *WHERE { ?p akt:has-author ?a.…}<_:1,akt:has-author,_:2> • SELECT *WHERE { ?s kisti:CreatorInfo ?i. ?i kisti:hasCreator ?a…}<_:1,kisti:CreatorInfo,:_3><_:3,kisti:hasCreator,_:2> _:2 _:3 akt:has-author kisti:CreatorInfo kisti:hasCreator _:1 _:1 _:2
Ontology Alignments – Property Eq. • SELECT *WHERE { ?p source:temp ”10”^^C.…}<_:1,source:temp,_:2> • SELECT *WHERE { ?p target:farenheit ”50”^^F…}<_:1,target:farenheit,_:2> celsius2farenheit _:2 _:3 _:2 source:temp target:farenheit _:1 _:1 binding directly Celsius values to Fahrenheit is wrong, the two values are linked by a functional dependency.
SPARQL Rewriting • PREFIX id:<http://southampton.rkbexplorer.com/id/>PREFIX akt:<http://www.aktors.org/ontology/portal#>SELECT DISTINCT ?a WHERE { ?paper akt:has-author id:person-02686 . ?paper akt:has-author ?a .} ?paper akt:has-author akt:has-author ?a id:person-02686 _:2 _:3 akt:has-author kisti:CreatorInfo kisti:hasCreator _:1 _:1 _:2
SPARQL Rewriting ?paper kisti:CreatorInfo akt:has-author ?paper ?new1 id:person-02686 akt:has-author akt:has-author kisti:hasCreator ?a ?a id:person-02686 Problem in KISTI dataset <http://southampton.rkbexplorer.com/id/person-02686> is unknown. ?paper kisti:CreatorInfo kisti:CreatorInfo ?new2 ?new1 kisti:hasCreator kisti:hasCreator ?a id:person-02686
Co-reference integration • Constants in the query (like URIs) must be translated in order to retrieve correct results • URI equivalences are maintained by co-reference services like http://sameas.orgaccessible via REST interface. • Modeled as functional dependency within variables • Function returns the equivalent URI that satisfy a regex pattern • Datasets maintain URIs that are recognizable by a common schema (prefix for sure, e.g. http://dbpedia.org/resource/*)
Co-reference integration http://kisti.rkbexplorer.com/id/\S* sameas _:11 _:12 kisti:CreatorInfo akt:has-author _:3 kisti:hasCreator sameas _:21 _:22 id:person-02686 kisti:PER_000000000105047
Implementation • Java package based on Jena API for SPARQL Query rewriting • Code not released yet (planning to integrate it with INRIA ontology alignment API)
Progress report • Contact with Francois Schraffe and Jerome Euzenat • Partial mapping to EDOAL ontology alignment specification (work in progress) • SPARQL query rewriter to be implemented in the Alignment API (partially done)
EDOAL - Expressive and Declarative Ontology Alignment Language • Construction of entities from other entities can be expressed through algebraic operators • Restrictions can be expressed on entities in order to narrow their scope. • Transformations of property values can be specified. Property values using different encoding or units can be aligned using transformations.
EDOAL - Example <http://oms.omwg.org/wine-vin/MappingRule_3> :entity1 wine:Bordeaux ; :entity2 [ edoal:and (vin:Vin [ a edoal:AttributeValueRestriction edoal:comparatorxsd:equals ; edoal:onAttribute [ edoal:compose (vin:hasTerroir proton:locatedIn ) ; a edoal:Relation ] ; edoal:valuevin:Aquitaine] ) ; a edoal:Class ] ; :measure "1."^^xsd:float ; :relation "SubsumedBy" ; a :Cell .
Internal Representation vin:Vin rdf:type wine:Bordeaux vin:hasTerroir rdf:type _:6 _:9 proton:locatedIn _:6 vin:Aquitaine
Progress report • Graph pattern rewriting can be used also for creating CONSTRUCT queries for translate RDF graphs with different ontologies. CONSTRUCT { ?9 <http://proton.semanticweb.org/locatedIn> <http://ontology.deri.org/vin#Aquitaine> . ?6 <http://ontology.deri.org/vin#hasTerroir> ?9 . ?6 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://ontology.deri.org/vin#Vin> .} WHERE { ?6 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/TR/2003/CR-owl-guide-20030818/wine#Bordeaux> .}
Thanks Questions?
Outline • Linked Data • Data topology • Data access • Query Rewriting • Ontology Alignment • Entity Alignment • SPARQL rewriting
Linked Data topology • Foreign URIs for referring to external entities • Co-references for referring to instance “equivalence”