270 likes | 394 Views
A Proof of Concept: Provenance in a Service Oriented Architecture. Liming Chen, Victor Tan, Fenglian Xu, Alexis Biller, Paul Groth, Simon Miles , John Ibbotson, Michael Luck and Luc Moreau. Purpose.
E N D
A Proof of Concept:Provenance in a Service Oriented Architecture Liming Chen, Victor Tan, Fenglian Xu, Alexis Biller, Paul Groth, Simon Miles, John Ibbotson, Michael Luck and Luc Moreau
Purpose • Asking questions about the provenance of something, i.e. the process by which it came to be as it is, is essential in many domains • We are working with bioinformaticians, medics, aerospace engineers, physicists and have found a wide range of questions they wish to ask • A simple example application can: • Clarify the requirements on software to aid answering those questions • Be used to explain the issues involved to non-domain experts • Be extended in controlled ways to explore issues that arise in ‘real’ applications
EU Provenance and PASOA • Recent work of the EU Provenance project: • Developed a logical architecture for software to aid answering provenance-related questions, along with other research on security, scalability and user tool support. • Now being applied to two project applications: organ transport management (UPC, Spain) and aerospace engineering (DLR, Germany) • The logical architecture document should be released next week: keep an eye on www.gridprovenance.org • Recent work of the PASOA project: • Has focused on e-Science applications and has gathered requirements, developed protocols and software • EU Provenance used PASOA software for the work described in this talk • PASOA will be discussed in the following two presentations
Outline • The example application • Asking provenance-related questions • The example as a service-oriented process • Recording documentation of a process • What does the example show us? • What are the limits of the example? • Conclusions
Baking a Victoria Sponge • INGREDIENTS • 110g (4oz) Butter 110g (4oz) Caster Sugar 110g (4oz) Self-raising Flour 2 Eggs Vanilla Essence or 1 tsp Grated Lemon Rind • RECIPE • Preheat oven to 190°C: 375°F: Gas 5. Whisk together the butter and sugar until light and creamy. Add the beaten eggs gradually with a little of the flour. Fold in the remaining sieved flour and add the flavouring. Divide equally between two 15cm (6 inch) sandwich tins. Bake for 20 - 25 minutes. Turn out on to a wire rack to cool. • This is not so a contrived an example! www.thefoody.com
20g sugar and 20g butter whisk them together get mixture 1
beat the eggs for 2 minutes 2 eggs mix the beaten eggs with mixture 1 obtain mixture 2
100g flour together with mixture 2 fold to mixture 3
set baking time to 30min put mixture 3 into oven obtain a cake set baking temperature to 180˚C
After Baking • Some questions can be asked after baking a cake • Answers to the questions can be found if we record details of the baking process during its execution • Details of the baking process is what we call the provenance of a cake
“What went wrong?” Questions • Did we follow the recipe accurately? • Did we use the correct ingredients at the right time? • Did we provide the correct quantities? Correct units? • Did we perform actions for the right duration? • We need to keep a record of all actions performed with all their parameters (such as the number of eggs used) • Organ transplant example: Did the medics follow the correct procedure? • Bioinformatics example: Did I analyse a amino acid sequence using tools that actually only apply to nucleotide sequences?
“What went wrong?” Questions • Other factors can affect the baking process: • Amount of flour required varies with altitude • Oven is broken and baked at a different temperature • We need to know the “internal state” of the different entities participating in the baking process (such as actual oven temperature or oven altitude) • Organ transplant example: By what criteria did a team decide to accept or reject an organ? • Bioinformatics example: What script was used by the services to perform each stage of the experiment?
“Process Analysis” Questions • Did we use the same amount of ingredients for baking cake 1 and cake 2? or in the same proportion? • What was the longest step in the execution of a recipe? • Why did not we finish the process? Where did we stop? • The process that led to a given cake should be delimited and analysable • Organ transplant example: Which patient’s death led to the organ now being transplanted? • Bioinformatics example: What samples led to the final analysis result?
“What Did Parties Do?” Questions • Did the baker follow the user’s instructions (regardless of any claim from the baker)? • Did each step of the baking process follow the user’s instructions? Did they receive the correct instructions? • Did they follow the received instructions? • All entities should document their view of a process because it may vary • Organ transplant example: Were there differing opinions on the suitability of an organ for transplant? • Bioinformatics example: I claim I used a database in my experiments whose license allows me to patent my results: does the database owner confirm this?
Implementation • We implemented the application as a set of Web Services, and then implemented clients that answered the provenance-related questions by querying the provenance store • This involved mapping the scenario onto a service-oriented architecture
User Baker Whisk Beat & Mix Fold Oven Bake Sugar+ Flour + Beating Time + Temperature Butter + Sugar Mixture 1 + Eggs + Beating Time Mixture 2 Flour + Mixture 2 Mixture 1 Mixture 3 Mixture 3 + Temperature + Baking Time Cake Cake Service-Oriented Process
User Baker Whisk Beat & Mix Fold Oven Bake Recording Provenance Store Baker (Sugar, Flour, Beating Time, Temperature After baking, the provenance store contains a trace of the different activities that were involved in the production of a cake. Whisk (Butter, Sugar) WhiskReturn (Mixture 1) Beat&Mix (Mixture 1, Eggs, Beating Time) Beat&MixReturn (Mixture 2) The provenance of a cake is the documentation of the process that led to that cake Fold (Flour, Mixture 2) FoldReturn (Mixture 3) OvenBake (Mixture 3, Temperature, Baking Time) OvenBakeReturn (Cake) BakerReturn (Cake)
Process Documentation and Provenance • We distinguish • process documentation (the documentation recorded into a provenance store about a process) • provenance (the information retrieved from a provenance store about a process) • This is because we have found there to be different requirements on each Process documentation Provenance Processing
Process documentation • Should allow questions about the provenance of entities to be answered • Should follow a consistent, application-independent structure so that independent parties can record documentation that is easily combined • e.g. oven may be owned by someone other than the user, but their documentation is combined to answer whether the requested temperature was used • Should state exactly what those recording it know to have happened, not confuse it with what they guessed or inferred had happened • e.g. baker states that it put the cake in the oven, not that the cake was successfully baked, because the oven may have been broken
Provenance • Should give the client asking for the provenance of something control over the scope of the answer • e.g. whether the process that produced the flour is included in the provenance of the cake • Should be/provide the information relevant to answering a client’s/user’s questions (not swamp them with detail) • e.g. report how much flour used rather than giving XML structure sent between application components • May (in order to achieve the above) include inferred information • e.g. infer from baker putting mixture in oven and getting cake out that the cake was successfully baked from the mixture
Provenance architectures • Should allow different parties to record independent documentation if they want to • e.g. user and baker can record independently, allowing discrepancies to be noticed • Should have no dependence on any one workflow engine/language, and no requirement for (explicit) workflows to be used at all • e.g. our example application was written in Java, and baking in reality follows a plan in someone’s head • Should have independence from any one product of a process: should not be necessary to store process documentation with any one result of a process • e.g. the provenance of the cake, the provenance of the ingredients and the provenance of the intermediate mixtures overlap, so cannot claim it ‘belongs’ to any
Limitations and Strengths • The current example has limitations: • Physical world treated as if it mapped directly to the electronic world: how does a baker record documentation in a provenance store Web Service? through a GUI? what if the GUI goes wrong or they use the GUI wrongly, do we still have sound process documentation? • None of the objects in the process have constituent parts that we may want to independently find the provenance of • Assumes a single provenance store that every service happily submits documentation to • …but the strength of the example is that it can be simply extended to remove these limitations
Conclusions • The simple example allows us to determine the requirements on software to record process documentation and make it available to users • We have used it as a testbed, extending it to explore other aspects of provenance (along with other applications) • It is rich enough to continue extending to mirror, in a controlled way, issues discovered in the future
EU Provenance Partners • IBM United Kingdom Limited • University of Southampton • University of Wales, Cardiff • Deutsches Zentrum fur Luft- und Raumfahrt s.V • Universitat Politecnica de Catalunya • Magyar Tudomanyos Akademia Szamitastechnikai es Automatizalasi Kutato Intezet