250 likes | 271 Views
Explore the importance of provenance in workflows with practical examples and research insights from Bill Howe at the eScience Institute.
E N D
Workflow Provenance Bill Howe
Comparison Bill Howe, eScience Institute
What is Provenance? src: David Holland Bill Howe, eScience Institute
Example src: David Holland Bill Howe, eScience Institute
PROVENANCE Store TU.1 Data Collection request OTM.1 Donor Data request HC.1 Patient Data request EHCR Hospital A EHCR Hospital B TU.2 Serology Test request OTM.2 Donor Data HC.2 Patient Data TU.3 Brain Death Notification + report OTM.3 Serology test request TU.4 Decision request OTM.4 Serology test result + report TU.5 Decision + report An Example 1. Agent messages are recorded as interactions, either by the agents or by the agent platform 2. Agents record the internal relationships between inputs and outputs, plus extra meaningful information. Transplant Unit Interface Agent OTM Donor Data Collector Agent Test Lab. Interface Agent Bill Howe, eScience Institute
Patient Data Request Patient Data Hospital B response to HC.2 HC.1 caused by contains parts of Data Collection Request Donor Data Request response to caused by Donor Data based on is logged in OTM.2 OTM.1 TU.1 User Z User X is logged in Brain Death report Brain Death Notification based on justified by response to Decision Request Donation Decision TU.4 TU.5 TU.3 TU.3 User Y caused by Author B justified by is logged in authored by Decision report based on Serology Test Result Serology Test Request Serology Test Request caused by response to TU.5 justified by TU.2 OTM.3 OTM.4 Author A User X authored by caused by User W is logged in is logged in Serology report OTM.4 Author C authored by • Which is the basis for • donation decision D? Bill Howe, eScience Institute
Use cases • Data Quality • Audit Trail • Replication Recipes • Attribution • Informational/Communication • What else? Bill Howe, eScience Institute
Research Questions Bill Howe, eScience Institute
Provenance Taxonomy Bill Howe, eScience Institute
Types of Provenance, Redux • Data Provenance • Metadata + History of a Data Object • Workflow Provenance • Metadata + History of the workflow itself • Source control Bill Howe, eScience Institute
COMAD • Collection-oriented Modeling and Design • Susan Davidson, Upenn • Workflows may exhibit assembly line semantics • open and close interleaved “read scopes” and “write scopes” Bill Howe, eScience Institute
Provenance Aware Storage System • David Holland, Harvard Bill Howe, eScience Institute
PASS Architecture Prov. and Storage Layer Bill Howe, eScience Institute
VisTrails • demo Bill Howe, eScience Institute
Pegasus/Wings ZOOM ES3 SDG Karma JP Mindswap Redux RWS NCSCI USC/ISI OPA VDL MyGrid Other Provenance Systems Bill Howe, eScience Institute
Open Provenance Challenge • 2006, First: Compare Expressiveness of provenance systems • 2007, Second: Interoperability and Exchange • 2008, Third: Evaluation of the Open Provenance Model • 2010, Fourth and Last to apply the Open Provenance Model to a broad end-to-end scenario, and demonstrate novel functionality that can only be achieved by the presence of an an interoperable solution for provenance Bill Howe, eScience Institute
First Open Provenance Challenge Bill Howe, eScience Institute
Challenge Workflow Bill Howe, eScience Institute
Challenge Queries Bill Howe, eScience Institute
Challenge Queries (2) Bill Howe, eScience Institute
Categorization of Provenance Systems • Execution Environment • Representation Technology • SQL, RDF, etc. • Query Language • Research Emphasis • Execution, Recording, Storing, Querying Bill Howe, eScience Institute
Categorization (2) • Includes WF Representation • Data Derivation vs. Causal Events • “Nouns” or “Verbs” • Annotations • Time • Naming • Tracked Data, Granularity • Files, collections, bytes, tuples • Abstraction Mechanisms • functions, etc. Bill Howe, eScience Institute
Results Bill Howe, eScience Institute
Results Bill Howe, eScience Institute
Results Bill Howe, eScience Institute