210 likes | 363 Views
A Semantic Web Approach for the Third Provenance Challenge. Tetherless World Constellation @ Rensselaer Polytechnic Institute James Michaelis, Li Ding, Rui Huang, Zhenning Shangguan, Deborah L. McGuinness. Introduction.
E N D
A Semantic Web Approach for the Third Provenance Challenge Tetherless World Constellation @ Rensselaer Polytechnic Institute James Michaelis, Li Ding, Rui Huang, Zhenning Shangguan, Deborah L. McGuinness
Introduction • Our approach the Third Provenance Challenge (called TetherlessPC3) is designed to leverage Semantic Web technologies • Support for two things useful for answering the provided queries: • Declarative inference – SPARQL + OWL Syntax • Augmenting provenance data derived from the workflow execution with supplementary information – SPARQL
TetherlessPC3 Approach Provenance Generator Query Front-End 1 2 Import/Export Component 3
TetherlessPC3 Approach Provenance Generator Query Front-End 1 2 Query (English) PC3OPM (OWL) Run TW’s Workflow code Trace (OWL) Translation (English-Sparql) Import/Export Component 3 Run Query (Pellet/Jena) Query (SPARQL) Translation (OPM-PC3OPM) Translation (PC3OPM-PML) Normalization (OPM’-OPM) Trace (OPM) Trace (PML) Results (Text) Run other team’s Workflow code Trace (OPM’)
TetherlessPC3 Approach Provenance Generator Query Front-End 1 2 PC3OPM (OWL) Query (English) Run TW’s Workflow code Trace (OWL) Translation (English-Sparql) Import/Export Component Produces provenance traces in Web Ontology Language (OWL) format, using Jena – a Java-based Semantic Web framework These are structured based on the PC3OPM Ontology at http://www.cs.rpi.edu/~michaj6/Provenance/PC3OPM.owl PC3OPM is designed to be compatible with the OPM Specification v1.01 3 Run Query (Pellet/Jena) Query (SPARQL) Translation (OPM-PC3OPM) Translation (PC3OPM-PML) Normalization (OPM’-OPM) Trace (OPM) Trace (PML) Results (Text) Run other team’s Workflow code Trace (OPM’)
TetherlessPC3 Approach Provenance Generator Query Front-End 1 2 Query (English) PC3OPM (OWL) Run TW’s Workflow code Trace (OWL) Translation (English-Sparql) Import/Export Component To get the provenance workflow execution service used This is designed to run a modified version of the workflow emulation code provided by Yogesh Simmhan (Microsoft Research) This modified version contains injected code (in section for executing high level workflow) to recording provenance information based on PC3OPM 3 Run Query (Pellet/Jena) Query (SPARQL) Translation (OPM-PC3OPM) Translation (PC3OPM-PML) Normalization (OPM’-OPM) Trace (OPM) Trace (PML) Results (Text) Run other team’s Workflow code Trace (OPM’)
Three properties of PC3OPM • Provide direct mappings to OPM concepts • Example: PC3OPM:Artifact to the OPM concept “Artifact” • Reification of OPM relations • Example: For the relationship (Process1, WasTriggeredBy, Process2) • Declare an instance of the class PC3OPM:WasTriggeredBy. • Extend the definitions in OPM through new concepts • Domain dependent: some terminology specific to Third Provenance Challenge workflow • Example: CSVFileEntry (subclass of OPM Artifact) • Domain independent: Terminology from the Proof Markup Language (PML) • We added a new concept “Function” based on (pmlp:inferenceRule), where an OPM process is an execution of a “Function”
Proof Markup Language (PML) WHAT IS IT? A Provenance interlingua designed for representing and sharing explanations generated by various intelligent systems. Originally designed to explain activity of theorem proof generators Part of the Inference Web framework (which includes tools for browsing, validating PML) THREE PARTS Justification: Provides structure for describing how a conclusion was derived Provenance: Metadata on information referenced in Justification Trust: Metadata on trust for information referenced in Justification
TetherlessPC3 Approach Provenance Generator Query Front-End 1 2 PC3OPM (OWL) Query (English) Run TW’s Workflow code Trace (OWL) Translation (English-Sparql) Import/Export Component • What we have done • Review given English-based queries and form • corresponding SPARQL Queries • Update PC3OPM ontology to assist with (1) • and re-generate the Provenance trace • Run queries, and get back results 3 Run Query (Pellet/Jena) Query (SPARQL) Translation (OPM-PC3OPM) Translation (PC3OPM-PML) Normalization (OPM’-OPM) Trace (OPM) Trace (PML) Results (Text) Run other team’s Workflow code Trace (OPM’)
TetherlessPC3 Approach Provenance Generator Query Front-End 1 2 PC3OPM (OWL) Query (English) Run TW’s Workflow code Trace (OWL) Translation (English-Sparql) Import/Export Component • Technologies used • SPARQL - RDF Query Language • Pellet – an Open Source OWL Reasoner 3 Run Query (Pellet/Jena) Query (SPARQL) Translation (OPM-PC3OPM) Translation (PC3OPM-PML) Normalization (OPM’-OPM) Trace (OPM) Trace (PML) Results (Text) Run other team’s Workflow code Trace (OPM’)
Query Answering Example • Provenance Challenge core question 3: • “Which operation executions were strictly necessary for the Image table to contain a particular (non-computed) value?” • Our interpretation: • Find the Process X which generated the Image table • Look for the processes XT (directly or indirectly) triggered X • Return X and as XT as query results • Handling this query: • Rather than write a recursive program, we use OWL-based transitive properties in the answer
Enhancing Provenance Trace • To keep our provenance trace simple and concise, we don’t put in transitive properties – since most of the queries don’t need them • To insert them when necessary, we create additional RDF data through a SPARQL CONSTRUCT query
SPARQL SELECT Query PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX PC3: <http://www.cs.rpi.edu/~michaj6/provenance/OurTrace.owl#> PREFIX PC3OPM: <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl#> SELECT ?fxn1 ?fxn2 FROM <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl#> FROM http://www.cs.rpi.edu/~michaj6/provenance/OurTrace.owl# FROM <http://onto.rpi.edu/sw4j/sparql?queryURL=http://tw.rpi.edu/proj/portal.wiki/images/3/36/MakeMoreTriples.sparql> WHERE { ?wgb PC3OPM:wgbSource PC3:provVarDbEntryP2ImageMeta_0 . ?wgb PC3OPM:wgbTarget ?fxn1 . OPTIONAL { ?fxn1 PC3OPM:opWasTriggeredBy ?fxn2 . } }
SPARQL CONSTRUCT Query PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX PC3: <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl#> PREFIX PC3OPM: <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl#> CONSTRUCT { ?FXN PC3OPM:opWasTriggeredBy ?FXN2 } FROM <http://www.cs.rpi.edu/~michaj6/provenance/PC3.owl> FROM <http://www.cs.rpi.edu/~michaj6/provenance/PC3OPM.owl> WHERE { ?USD PC3OPM:usdSource ?FXN . ?USD PC3OPM:usdTarget ?VAR . ?WGB PC3OPM:wgbSource ?VAR . ?WGB PC3OPM:wgbTarget ?FXN2 }
TetherlessPC3 Approach Provenance Generator Can Import: OPM Graphs Can Export: OPM Graphs PML Proofs The Import/Export protocols for OPM are handled through the OPM API Likewise, the import/export Protocols for PML are handled Through a PML API developed by our lab. Query Front-End 1 2 Query (English) PC3OPM (OWL) Run TW’s Workflow code Trace (OWL) Translation (English-Sparql) Import/Export Component 3 Run Query (Pellet/Jena) Query (SPARQL) Translation (OPM-PC3OPM) Translation (PC3OPM-PML) Normalization (OPM’-OPM) Trace (OPM) Trace (PML) Results (Text) Run other team’s Workflow code Trace (OPM’)
Discussion: Importing From Other Teams • Some OPM graphs generated by other teams were not parsable by OPM API, so normalization was needed • Our SPAQRL queries (used on our provenance trace) only needed slight modification to handle imported provenance (change URIs of artifacts) • Some information loss was observed with many teams dumping provenance traces to OPM • Control flow traces were not captured by some teams
Our Team Green Team Comparing with other Teams:Answering Core Query 3 Blue Team
Conclusions • Semantic Web technologies useful for handling provenance data • Provenance generation – RDF/OWL helps clarify the domain specific concepts/entities in provenance metadata • Provenance Query – supported by SPARQL + OWL inference • We can capture control flow and data flow • Using transitive inference rules, we don’t need to write a program to implement a recursive query • Provenance integration – RDF/OWL syntax of OPM (with references to domain terminology) will help avoid information loss issues when exporting OPM data
References • OWL http://www.w3.org/TR/owl-features/ • SPARQL http://www.w3.org/TR/rdf-sparql-query/ • Pellet http://clarkparsia.com/pellet/ • Jena http://jena.sourceforge.net/ • PML API http://inference-web.org/wiki/Tools_%26_Demos • OPM API http://openprovenance.org/java/maven-snapshots/org/openprovenance/