110 likes | 264 Views
Feedback on OPM. Yogesh Simmhan Microsoft Research Synthesis of pairwise conversations with: Roger Barga Satya Sahoo Microsoft Research Beth Plale Abhijit Borude Indiana University. Roles. Using “role” annotations in OPM is not well defined…
E N D
Feedback on OPM Yogesh Simmhan Microsoft Research Synthesis of pairwise conversations with: Roger Barga Satya Sahoo Microsoft Research Beth Plale Abhijit Borude Indiana University
Roles • Using “role” annotations in OPM is not well defined… • Named relationships are used as first class objects as defined in the RDF model • Affect the way inferences are made • Semantically meaningful or not?
John Flour Used(flour) wasGeneratedBy(unused) Eggs(1) Bake Eggs(3) Cake Used(eggs) wasGeneratedBy(cake) John Flour Used(flour) wasGeneratedBy(unused) Eggs(A) Bake Eggs(A, B, C) Cake Used(eggs) wasGeneratedBy(cake)
Accounts • Composite processes identified in OPM • Different granularity? • Different “view” (client vs service) • service/workflow composition using alternate accounts? • Should we specify composition more explicitly in edges as edge types? Subclasses? Baker Customer A Observer Baking Observers Baking [] [] [] Customer B
Data Collections • does not seem to support the idea of granularity for data products • Alternate accounts more suited for process granularity, less for data granularity • processtypes for data de/compositions? Subclasses?
Annotations • Causality is not the only relationship between provenance entities • Relevant domain-specific relationships that are needed to answer a scientists query. • Subclasses stronger form of annotations • Different? • Subclasses part of model • Annotations dependent on representation? Extensibility mechanisms?
Representation/Serialization • OPM maps exactly to the W3C recommended standard to represent metadata Resource Description Framework (RDF) • OPM graph is differently named RDF graph • XML, RDF, CSV…
Time • OPM approach to incorporating temporal parameter in provenance using time interval to represent instantaneous is not well defined • based on granularity of <t> values the query result will vary • Accuracy of timestamps affects inference • Logical timestamps? • Do we need time range? • Long running process (provenance is “past”, not “current”)…
Agent • Loose form of control flow? • Workflow engine? • Commandline invoking workflow engine? • Researcher who starts commandline? • Previous component that triggers next component? • Where do we have TriggeredBy and where do we have ControlledBy?
WF Engine User WF document ? ? Input data Service Output data WF document Client WF Engine ? ? WF document ? Client? ? Service WF Engine? Output data Input data Input data Service Output data
Vagueness in Inferences • Edge count limits? • Weak and strong semantics • P1 used A1 • P1 MUST have used A1 • P1 MAY have used A1 • P1 used A1; A2 wasGenerated by P1 • A2 MUST have been derived from A1 • A2 MAY have been derived from A1 • Weak is lowest common denominator • mayHaveBeenUsed <= mustHaveBeenUsed…subclass?