1 / 24

On Explicit Provenance Management in RDF/S Graphs

On Explicit Provenance Management in RDF/S Graphs. Panagiotis Pediaditis Giorgos Flouris Irini Fundulaki Vassilis Christophides {pped, fgeo, fundul, christop}@ics.forth.gr. Institute of Computer Science Foundation for Research and Technology – Hellas Heraklion, Greece.

ailish
Download Presentation

On Explicit Provenance Management in RDF/S Graphs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On Explicit Provenance Management in RDF/S Graphs Panagiotis PediaditisGiorgos FlourisIrini FundulakiVassilis Christophides {pped, fgeo, fundul, christop}@ics.forth.gr Institute of Computer Science Foundation for Research and Technology – Hellas Heraklion, Greece Giorgos Flouris

  2. Provenance Management in RDF/S • Provenance management problem • Mostly addressed in the database context • We are dealing with why provenance in RDF/S graphs • Why provenance: identifying the source data that had some influence on the existence of the target data • Three main characteristics (peculiarities of RDF/S) • Triple-based representation • Use quadruples to talk about triples’ provenance • Inference • Assign provenance information to implicit data • Coherence semantics (in updates) • Implicit data is a first-class citizen and should be retained during change, along with its provenance information Giorgos Flouris

  3. Characteristic #1Triple-based Representation Giorgos Flouris

  4. instancerdf:type subclassrdfs:subClassOf RDF Graphs Define classes [Paper rdf:type rdfs:Class] [PaperTAPP rdf:type rdfs:Class] [Person rdf:type rdfs:Class] [Author rdf:type rdfs:Class] Define properties [writes rdf:type rdf:Property [writes rdfs:domain Author] [writes rdfs:range Paper] Instantiate (and define) individuals [Paper10 rdf:type PaperTAPP] [Giorgos rdf:type Author] [Giorgos writes Paper10] Define hierarchies [PaperTAPP rdfs:subClassOf Paper] [Author rdfs:subClassOf Person] And other stuff… RDF graph = set of RDF triples Person Paper writes Author PaperTAPP Giorgos Paper10 Giorgos Flouris

  5. Publications Graph (PUB) instancerdf:type TAPP Graph (TAPP) subclassrdfs:subClassOf Provenance in RDF Graphs Person Paper PUB: [Paper rdf:type rdfs:Class] TAPP: [PaperTAPP rdf:type rdfs:Class] PUB: [Person rdf:type rdfs:Class] PUB: [Author rdf:type rdfs:Class] PUB: [writes rdf:type rdf:Property] PUB: [writes rdfs:domain Author] PUB: [writes rdfs:range Paper] TAPP: [Paper10 rdf:type PaperTAPP] TAPP: [Giorgos rdf:type Author] TAPP: [Giorgos writes Paper10] TAPP: [PaperTAPP rdfs:subClassOf Paper] PUB: [Author rdfs:subClassOf Person] writes Author PaperTAPP Giorgos Paper10 Giorgos Flouris

  6. instancerdf:type subclassrdfs:subClassOf Named Graphs and Provenance • Create two named graphs and assign an ID (URI) to each • Publications graph (URI: PUB) • TAPP graph (URI: TAPP) • Each named graph corresponds to a different source • Need some method to associate named graphs with triples • Triples become quadruples • Fourth element is the URI of the named graph (origin) Person Paper writes Author PaperTAPP Giorgos Paper10 Giorgos Flouris

  7. instancerdf:type subclassrdfs:subClassOf Quadruples for Provenance • [Paper rdf:type rdfs:Class PUB] • [PaperTAPP rdf:type rdfs:Class TAPP] • [Person rdf:type rdfs:Class PUB] • [Author rdf:type rdfs:Class PUB] • [writes rdf:type rdf:Property PUB] • [writes rdfs:domain Author PUB] • [writes rdfs:range Paper PUB] • [Paper10 rdf:type PaperTAPP TAPP] • [Giorgos rdf:type Author TAPP] • [Giorgos writes Paper10 TAPP] • [PaperTAPP rdfs:subClassOf Paper TAPP] • [Author rdfs:subClassOf Person PUB] • All quadruples of the form [s p o PUB] originate from named graph PUB (Publications graph) • All quadruples of the form [s p o TAPP] originate from named graph TAPP (TAPP graph) Person Paper writes Author PaperTAPP Giorgos Paper10 Giorgos Flouris

  8. instancerdf:type subclassrdfs:subClassOf Properties of Named Graphs • The named graph URI can be used to refer to the named graph • Can be used for assignment of metadata[TAPP hasAuthor JamesCheney G] • Granularity of provenance • A triple is the smallest bit of information • The granularity of provenance achieved by named graphs is at the triple level • Flexible • A named graph can contain 0,1, or many triples • A triple can belong to 0,1, or many named graphs Person Paper writes Author PaperTAPP Giorgos Paper10 Giorgos Flouris

  9. Characteristic #2Inference Giorgos Flouris

  10. instancerdf:type subclassrdfs:subClassOf RDF/S Graphs • RDF Schema: add-on to RDF • RDFS adds inference semantics • Transitivity of subclass/subproperty • Implicit instantiations • Example • [Giorgos rdf:type Author] • [Author rdfs:subClassOf Person] • Inference: [Giorgos rdf:type Person] • Inferred knowledge is implicit Person Paper writes Author PaperTAPP Giorgos Paper10 Giorgos Flouris

  11. instancerdf:type subclassrdfs:subClassOf Provenance and Inference • Quadruples: • [Giorgos rdf:type Author PUB] • [Author rdfs:subClassOf Person TAPP] • [Giorgos rdf:type Person ???] • Needs: • Shared ownership • A more sophisticated, compound structure • Keeping the connection with the components • Composition operator (PT=PUB●TAPP) • [Giorgos rdf:type Person PT] • Ok, but see characteristic #3 Person Paper writes Author PaperTAPP Giorgos Paper10 Giorgos Flouris

  12. Characteristic #3 Coherence Semantics (in Updates) Giorgos Flouris

  13. Foundational Semantics • Foundational viewpoint (pyramid): • Knowledge consists of the explicitly represented knowledge • Only explicit knowledge can be changed • Implicit knowledge is affected indirectly, through the changes in the explicit knowledge (so that the resulting “pyramid” is “stable”) • Explicit knowledge is more important than implicit knowledge Supported Knowledge Implicit Knowledge Explicit Knowledge Basic Knowledge Giorgos Flouris

  14. Coherence Semantics • Coherence viewpoint (raft): • No discrimination between explicit and implicit knowledge • Both explicit and implicit knowledge can be changed • Changes should be made coherently in order for the resulting knowledge to make sense (so that the “raft” is “stable”) • Explicit and implicit knowledge are of the same value { Knowledge(includes both implicit and explicit knowledge) Giorgos Flouris

  15. instancerdf:type subclassrdfs:subClassOf Deletes • Under coherence semantics • Inferred knowledge needs to be made explicit (when in danger of being lost) • Explicit assignment of shared origin to triples • Explicit shared origin assignment • Cannot use any composition operator • Must be a first-class construct (autonomous) • Retain the connection with its constituents • A need, but also a useful feature Person Paper writes Author PaperTAPP Giorgos Paper10 Giorgos Flouris

  16. instancerdf:type subclassrdfs:subClassOf RDF/S Graphsets • Graphsets are like named graphs • Have IDs (URIs) • Used in quadruples • Association of triples with graphsets[Giorgos rdf:type Person PT] • Can be referred to (metadata)[PT rdf:type Confidential G] • Encode origin or shared origin • [Giorgos rdf:type Person PT] • URI association (via skolem function) • PT is the URI of {PUB, TAPP} • PUB is the URI of {PUB} • A named graph is a graphset • PUB corresponds to {PUB} Person Paper writes Author PaperTAPP PT Giorgos Paper10 Giorgos Flouris

  17. instancerdf:type subclassrdfs:subClassOf Querying With RDF/S Graphsets • Standard queries (original RQL) • Give me the Persons [Giorgos] • Provenance queries (extended RQL) • Give me the Persons per {PUB}[ ] • Give me the Persons per {TAPP, PUB}[Giorgos] • Give me the sources per which Author is a subclass of Person[{PUB}] • Give me all the individual sources[{TAPP}, {PUB}] Person Paper writes Author PaperTAPP Giorgos Paper10 Giorgos Flouris

  18. Validity and Redundancy Elimination • Two invariants for RDF/S graphs • Valid (per some validity rules) • Redundant-free (space considerations) • The invariants allow optimized execution of queries • These invariants are imposed during change • Improve query speed, but make updates more difficult • Trade-off between having query overhead or update overhead Giorgos Flouris

  19. Updating With RDF/S Graphsets • Updates supported through an extended version of RUL • INSERT and DELETE • Only for data (class and property instances) • Implicit or explicit knowledge • Take into account and update graphset (provenance) information • Main considerations • Apply the change (INSERT or DELETE) • Respect invariants • Non-redundancy (INSERT) and validity (DELETE) • Make minimal changes (under coherence viewpoint) • No unnecessary loss of information • Take into account and preserve graphset (provenance) information • Applicable upon quadruples Giorgos Flouris

  20. Conclusion • Objective: assign provenance information to RDF/S graphs to capture why provenance • Triple-based representation • Turned triples into quadruples and used named graphs to record the origin • Inference (per RDFS) • Composed named graphs • Coherence semantics in updates (deletes) • Used graphsets for composed named graphs (cannot use an operator) • Proposed query and update languages for graphsets • Based on RQL, RUL • Can be used to query/update provenance information • Provided syntax and semantics, as well as an implementation • Demo at: http://139.91.183.30:3026/RULdemo/named_graph_demo/ Giorgos Flouris

  21. Thank You Giorgos Flouris

  22. EXTRA SLIDES Giorgos Flouris

  23. instancerdf:type subclassrdfs:subClassOf RDF/S Graphset Properties • Three types of triples in a graphset: • Explicitly assigned triples • Implicitly assigned triples (from the constituent named graphs) • Implications of the above (per RDFS) Person Paper PT writes Author PaperTAPP PT Giorgos Paper10 Giorgos Flouris

  24. INSERT Validity respected Must verify non-redundancy Process If INSERT is redundant ignore it Remove all redundant information (after insert) DELETE Must verify validity Non-redundancy respected Issues with inference and the coherence viewpoint Process If DELETE is void ignore it Make explicit all originally redundant information that will be lost otherwise Restore validity by removing property instances if necessary Inserts and Deletes: General Process Giorgos Flouris

More Related