1 / 25

Workflow Provenance

Explore the importance of provenance in workflows with practical examples and research insights from Bill Howe at the eScience Institute.

abner
Download Presentation

Workflow Provenance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Workflow Provenance Bill Howe

  2. Comparison Bill Howe, eScience Institute

  3. What is Provenance? src: David Holland Bill Howe, eScience Institute

  4. Example src: David Holland Bill Howe, eScience Institute

  5. PROVENANCE Store TU.1 Data Collection request OTM.1 Donor Data request HC.1 Patient Data request EHCR Hospital A EHCR Hospital B TU.2 Serology Test request OTM.2 Donor Data HC.2 Patient Data TU.3 Brain Death Notification + report OTM.3 Serology test request TU.4 Decision request OTM.4 Serology test result + report TU.5 Decision + report An Example 1. Agent messages are recorded as interactions, either by the agents or by the agent platform 2. Agents record the internal relationships between inputs and outputs, plus extra meaningful information. Transplant Unit Interface Agent OTM Donor Data Collector Agent Test Lab. Interface Agent Bill Howe, eScience Institute

  6. Patient Data Request Patient Data Hospital B response to HC.2 HC.1 caused by contains parts of Data Collection Request Donor Data Request response to caused by Donor Data based on is logged in OTM.2 OTM.1 TU.1 User Z User X is logged in Brain Death report Brain Death Notification based on justified by response to Decision Request Donation Decision TU.4 TU.5 TU.3 TU.3 User Y caused by Author B justified by is logged in authored by Decision report based on Serology Test Result Serology Test Request Serology Test Request caused by response to TU.5 justified by TU.2 OTM.3 OTM.4 Author A User X authored by caused by User W is logged in is logged in Serology report OTM.4 Author C authored by • Which is the basis for • donation decision D? Bill Howe, eScience Institute

  7. Use cases • Data Quality • Audit Trail • Replication Recipes • Attribution • Informational/Communication • What else? Bill Howe, eScience Institute

  8. Research Questions Bill Howe, eScience Institute

  9. Provenance Taxonomy Bill Howe, eScience Institute

  10. Types of Provenance, Redux • Data Provenance • Metadata + History of a Data Object • Workflow Provenance • Metadata + History of the workflow itself • Source control Bill Howe, eScience Institute

  11. COMAD • Collection-oriented Modeling and Design • Susan Davidson, Upenn • Workflows may exhibit assembly line semantics • open and close interleaved “read scopes” and “write scopes” Bill Howe, eScience Institute

  12. Provenance Aware Storage System • David Holland, Harvard Bill Howe, eScience Institute

  13. PASS Architecture Prov. and Storage Layer Bill Howe, eScience Institute

  14. VisTrails • demo Bill Howe, eScience Institute

  15. Pegasus/Wings ZOOM ES3 SDG Karma JP Mindswap Redux RWS NCSCI USC/ISI OPA VDL MyGrid Other Provenance Systems Bill Howe, eScience Institute

  16. Open Provenance Challenge • 2006, First: Compare Expressiveness of provenance systems • 2007, Second: Interoperability and Exchange • 2008, Third: Evaluation of the Open Provenance Model • 2010, Fourth and Last to apply the Open Provenance Model to a broad end-to-end scenario, and demonstrate novel functionality that can only be achieved by the presence of an an interoperable solution for provenance Bill Howe, eScience Institute

  17. First Open Provenance Challenge Bill Howe, eScience Institute

  18. Challenge Workflow Bill Howe, eScience Institute

  19. Challenge Queries Bill Howe, eScience Institute

  20. Challenge Queries (2) Bill Howe, eScience Institute

  21. Categorization of Provenance Systems • Execution Environment • Representation Technology • SQL, RDF, etc. • Query Language • Research Emphasis • Execution, Recording, Storing, Querying Bill Howe, eScience Institute

  22. Categorization (2) • Includes WF Representation • Data Derivation vs. Causal Events • “Nouns” or “Verbs” • Annotations • Time • Naming • Tracked Data, Granularity • Files, collections, bytes, tuples • Abstraction Mechanisms • functions, etc. Bill Howe, eScience Institute

  23. Results Bill Howe, eScience Institute

  24. Results Bill Howe, eScience Institute

  25. Results Bill Howe, eScience Institute

More Related