1 / 9

Query Language Constructs for Provenance

Query Language Constructs for Provenance. Murali Mani, Mohamad Alawa , Arunlal Kalyanasundaram University of Michigan, Flint Presented at IDEAS 2011. Provenance Metadata. Data about origins of data Applications: Check whether data item is valid – in health records

inigo
Download Presentation

Query Language Constructs for Provenance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Query Language Constructs for Provenance Murali Mani, MohamadAlawa, ArunlalKalyanasundaram University of Michigan, Flint Presented at IDEAS 2011.

  2. Provenance Metadata • Data about origins of data • Applications: • Check whether data item is valid – in health records • How much do we trust an inference/observation – scientific computation • Audit trails – manufacturing/shipping/trading • Database community found provenance could be useful in • updating views • maintenance of materialized views • interpretation of query results • querying probabilistic/uncertain data • In short, numerous applications …

  3. OPM (Open Provenance Model) http://openprovenance.org/ • Developed by several researchers who have been involved with provenance • Describes a logical representation of provenance information for a wide variety of applications. • Provenance information represented as a directed graph consisting of: • Nodes (can be artifact, process, or agent) • Edges or dependencies. There are 5 types of edges • Used: a process used an artifact • wasGeneratedBy: an artifact generated by a process • wasControlledBy: a process controlled by an agent • wasTriggeredBy: a process trigged by another process • wasDerivedFrom: an artifact derived from another artifact • Nodes and edges have annotations (attribute-value pairs)

  4. OPM: A Simple Example A1, A2 are artifacts P = a process that is performing division (A1/A2) – note the used edges between P and A1, A2 A3, A4 are artifacts generated by P (representing quotient, remainder) – note the wasGeneratedBy edges between P and A3, A4 A1 A3 A2 A4 used(dividend) used(divisor) P type=division wasGeneratedBy (quotient) wasGeneratedBy (remainder) Example taken from http://openprovenance.org/tutorial/

  5. Queries for OPM • We can write complex “multi-step inference” queries using Datalog/SQL based on the different edges in OPM • Example: find artifacts directly or indirectly derived from another artifact (recursive query using wasDerivedFrom edges) • However, is it sufficient? We may need to express • Sub-graph isomorphism (given a graph query pattern, check whether the pattern appears in a provenance graph) • Studied in graph query languages ([Graph-QL]), [OPQL] … • Shortest path queries (using some notion of distance) • Typically not studied in graph query languages

  6. Our approach • Two sets of constructs • Constructs for Querying Content • Select nodes, edges based on annotations (attribute values) associated with them • Operators include typical relational algebra operators: select, project, union, • Constructs for Querying Structure • 6 basic functions • from (e)/to (e): node from where e starts/e ends • from-1 (n)/to-1 (n): edges that start at node n/end at node n • next (n): nodes to where is an edge from n • prev (n): nodes from where there is an edge to n • Generalized selection operator, specified as • specifies what nodes in G must appear in the result • specifies what edges in G must appear in the result • Result: , is a sub-graph of G (i.e., , )

  7. Examples of Generalized Selection Operator • descendant graph given a set of nodes S • = set of nodes, n | there is a path from s S to n • = set of edges between the nodes selected by • shortest path graph between s and t • = set of edges on the shortest path between s and t • = set of nodes adjacent to an edge selected by • Note: The constructs for querying content and for querying structure can be integrated to yield a powerful query model, that can express a wide range of queries.

  8. Conclusions and Future Work • Observation: Provenance query language should not be restricted to Datalog/SQL. • Developed a query model that provides constructs for querying structure and for querying content. • Using our query model, we can express a wide range of queries including shortest path (not expressible using SQL/Datalog).

  9. References • [Graph-QL]: He, H., and Singh, A. K. 2008. Graphs-at-a-time: Query Language and Access Methods for Graph Databases. ACM SIGMOD (2008). • [OPQL]: Lim, C., Lu, S., Chebotko, A., and Fatouhi, F. 2011. OPQL: A First OPM-Level Query Language for Scientific Workflow Provenance. IEEE SCC (2011). • [OPM]: The OPM Provenance Model (OPM), available at http://openprovenance.org/

More Related