130 likes | 231 Views
Recording Actor Provenance in Scientific Workflows. Ian Wootten, Shrija Rajbhandari, Omer Rana I.M.Wootten@cs.cf.ac.uk Cardiff University, UK. What?. Provenance is concerned with process This may or may not be documented
E N D
Recording Actor Provenance in Scientific Workflows Ian Wootten, Shrija Rajbhandari, Omer Rana I.M.Wootten@cs.cf.ac.uk Cardiff University, UK
What? • Provenance is concerned with process • This may or may not be documented • Data Provenance – The process which leads to a particular piece of data • Actor Provenance - The process which leads to a particular actor state • How an actor (client or service) arrived at a particular state during an interaction (for stateless actors)
What? Actor Provenance Actor State Assertions: Asserting the state of an actor at a particular time during an interaction. A2 B2 A1 B1 Interaction Assertions: Asserting the contents of a message by an actor sending or receiving it. Service Service Enactment Engine
Metrics for Actor State Assertion • Static • No variation in value over actor lifetime • Per Node - Node identity, Operating system • Per Actor - Actor identity, Name, Owner, Version • Dynamic • Variation in value over actor lifetime • Per Node - Memory usage, Network traffic • Per Actor - Execution Time, Availability • Instrumented • Actor is ‘Instrumented’ at Key Points in its Execution • Description of internal data flow • Eg. German Aerospace Center (DLR) • Completion states for action events and file transfers
How? Actor Provenance Instrumented Output Instrumented Actor: Service information obtained from instrumented points within an actor. B2 Monitor Output B1 M1 M2 Monitoring Sources: Service information derived from hosting platform via monitoring sources (eg Ganglia) Service Service Enactment Engine
Why? Standalone and Combined Value • Standalone State Assertion Value • Actor Selection • Performance • Evaluation of Past / Prediction of Future • Resource Allocation • Actor administrator allocates resources according to performance metrics • Combined Value - Putting Assertions into Context • Interaction – Through Actor State Assertions • Determining the likely cause of error / results • Understanding what an actor is doing • Actor – Through Interaction Assertions • Understanding performance pattern observations • Understanding instrumented metric observations
How? Actor Provenance Registry • Attempt to provide a mechanism to specify and record actor state assertions for any application • Generic Mechanism Problems • No Knowledge of Potential Resources • Monitoring sources, containers • No Direct Knowledge of Implementation • Instrumented Data Capture
How? Actor Provenance Registry • Resource and Rule Registration • Resource – Monitoring Tool • Rule - User defined instructions • Indirectly from Resources • Coordinator polls resources for information • Times of interest – Service Invocation, Request • Directly from actor • Collection of Instrumented data • Representation?
How? Actor Provenance Registry • Integration with PReP [Groth et al.]
Data Mining Prototype • Record assertions using registry during invocation of a data modelling service • Service takes incoming data sets and generates a model based upon it • Uses Quantitative Structure-Activity Relationship (QSAR) to attempt to correlate biological activity to a chemical compound • Larger data set = longer run time
Performance Evaluation 5 rules 1 rule No rules
Conclusions / Future Work • Actor Provenance data is important • Without it, we don’t get the full picture • Prototype shows that it can be done • Room for improvement • Interface to Monitoring System • Caching of results • No inclusion of ‘instrumented’ actor capture • Requires service provider adoption to work
Prototype Configuration • Single machine holding both client, service and registry • Rules executed on invocation of service • XQuery • Invocations performed 100 times on datasets between 30KB – 340KB in size • Coordinator records rule results to a local file store