1 / 13

Recording Actor Provenance in Scientific Workflows

Recording Actor Provenance in Scientific Workflows. Ian Wootten, Shrija Rajbhandari, Omer Rana I.M.Wootten@cs.cf.ac.uk Cardiff University, UK. What?. Provenance is concerned with process This may or may not be documented

knox-scott
Download Presentation

Recording Actor Provenance in Scientific Workflows

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recording Actor Provenance in Scientific Workflows Ian Wootten, Shrija Rajbhandari, Omer Rana I.M.Wootten@cs.cf.ac.uk Cardiff University, UK

  2. What? • Provenance is concerned with process • This may or may not be documented • Data Provenance – The process which leads to a particular piece of data • Actor Provenance - The process which leads to a particular actor state • How an actor (client or service) arrived at a particular state during an interaction (for stateless actors)

  3. What? Actor Provenance Actor State Assertions: Asserting the state of an actor at a particular time during an interaction. A2 B2 A1 B1 Interaction Assertions: Asserting the contents of a message by an actor sending or receiving it. Service Service Enactment Engine

  4. Metrics for Actor State Assertion • Static • No variation in value over actor lifetime • Per Node - Node identity, Operating system • Per Actor - Actor identity, Name, Owner, Version • Dynamic • Variation in value over actor lifetime • Per Node - Memory usage, Network traffic • Per Actor - Execution Time, Availability • Instrumented • Actor is ‘Instrumented’ at Key Points in its Execution • Description of internal data flow • Eg. German Aerospace Center (DLR) • Completion states for action events and file transfers

  5. How? Actor Provenance Instrumented Output Instrumented Actor: Service information obtained from instrumented points within an actor. B2 Monitor Output B1 M1 M2 Monitoring Sources: Service information derived from hosting platform via monitoring sources (eg Ganglia) Service Service Enactment Engine

  6. Why? Standalone and Combined Value • Standalone State Assertion Value • Actor Selection • Performance • Evaluation of Past / Prediction of Future • Resource Allocation • Actor administrator allocates resources according to performance metrics • Combined Value - Putting Assertions into Context • Interaction – Through Actor State Assertions • Determining the likely cause of error / results • Understanding what an actor is doing • Actor – Through Interaction Assertions • Understanding performance pattern observations • Understanding instrumented metric observations

  7. How? Actor Provenance Registry • Attempt to provide a mechanism to specify and record actor state assertions for any application • Generic Mechanism Problems • No Knowledge of Potential Resources • Monitoring sources, containers • No Direct Knowledge of Implementation • Instrumented Data Capture

  8. How? Actor Provenance Registry • Resource and Rule Registration • Resource – Monitoring Tool • Rule - User defined instructions • Indirectly from Resources • Coordinator polls resources for information • Times of interest – Service Invocation, Request • Directly from actor • Collection of Instrumented data • Representation?

  9. How? Actor Provenance Registry • Integration with PReP [Groth et al.]

  10. Data Mining Prototype • Record assertions using registry during invocation of a data modelling service • Service takes incoming data sets and generates a model based upon it • Uses Quantitative Structure-Activity Relationship (QSAR) to attempt to correlate biological activity to a chemical compound • Larger data set = longer run time

  11. Performance Evaluation 5 rules 1 rule No rules

  12. Conclusions / Future Work • Actor Provenance data is important • Without it, we don’t get the full picture • Prototype shows that it can be done • Room for improvement • Interface to Monitoring System • Caching of results • No inclusion of ‘instrumented’ actor capture • Requires service provider adoption to work

  13. Prototype Configuration • Single machine holding both client, service and registry • Rules executed on invocation of service • XQuery • Invocations performed 100 times on datasets between 30KB – 340KB in size • Coordinator records rule results to a local file store

More Related