340 likes | 497 Views
Luc Moreau University of Southampton L.Moreau@ecs.soton.ac.uk. Provenance Challenges and Technologies for Grids. Contents. Provenance: problem definition Use cases of provenance in grids Architectural vision for provenance First experimentation, current work Research agenda
E N D
Luc Moreau University of Southampton L.Moreau@ecs.soton.ac.uk Provenance Challenges and Technologies for Grids
Contents • Provenance: problem definition • Use cases of provenance in grids • Architectural vision for provenance • First experimentation, current work • Research agenda • Provenance projects (EU, UK) • Conclusion
Provenance: definition • Main Entry: prov·e·nancePronunciation: 'präv-n&n(t)s, 'prä-v&-"nän(t)sFunction: nounEtymology: French, from provenir to come forth, originate, from Latin provenire, from pro- forth + venire to come -- more at PRO-, COMEDate: 17851: ORIGIN, SOURCE2: the history of ownership of a valued object or work of art or literature (Merriam-Webster Online)
The Grid and Virtual Organisations • The Grid problem is defined as coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organisations [FKT01]. • Effort is required to allow users to place their trust in the data produced by such virtual organisations
Provenance and Virtual Organisations Given a set of services in an open grid environment that decide to form a virtual organisation with the aim to produce a given result; How can we determine the process that generated the result, especially after the virtual organisation has been disbanded?
Provenance and Workflows • Workflow enactment has become popular in the Grid and Web Services communities • Workflow enactment can be seen as a scripted form of virtual organisation • The problem is similar: how can we determine the origin of enactment results?
Use cases • Bioinformatics • Aerospace Engineering • Organ transplant management • Chemistry • Physics
Provenance in Bioinformatics • Provenance in Drugs Discovery process Requirement on drug companies to keep a record of provenance of drug discovery as long as the drug is in use (up to 50 years sometimes). www.mygrid.org.uk
Provenance in Aerospace Engineering Provenance requirement: to maintain a historical record of inputs/outputs from each sub-system involved in simulations. • Aircrafts’ provenance data need to be kept for up to 99 years when sold to some countries. • Currently, little direct support is available for this.
Provenance in Organ Transplant Management • Decision support systems for organ and tissue transplant, rely on a wide range of data sources, patient data, and doctors’ and surgeons’ knowledge • Heavily regulated domain: European, national, regional and site specific rules govern how decisions are made. • Application of these rules must be ensured, be auditable and may change over time • Provenance allows tracking previous decisions: crucial to maximise the efficiency in matching and recovery rate of patients
Provenance in Chemistry • PhD student’s supervisor may check that checking student’s experiment • Generatate automatically papers describing how experiment was carried out. • Intellectual property rights. www.combechem.org
Physics CMS Atlas
What is the problem? • Provenance recording should be part of the infrastructure, so that users can elect to enable it when they execute their complex tasks over the Grid or in Web Services environments. • Currently, the Web Services protocol stack and the Open Grid Services Architecture do not provide any support for recording provenance. • Methods are generally adhoc and do not interoperate.
Architectural Vision Typical workflow enactment in service oriented architecture …
Architectural Vision … with provenance support
Sequence Diagram/Data Model • Must support recording of all information necessary to replay execution • Must support all complex forms of workflows (recursion, iterations, parallel execution).
negotiate configuration invocation result invocation and result notify invocation and result notify PReP: Provenance Recording Protocol client service Provenance Service
invocation invocation invocation client client service service Provenance Service result result invocation and result notify invocation and result notify invocation and result notify invocation and result notify invocation and result notify invocation and result notify client service Provenance Service Provenance Service result Provenance services may be shared or different Threesomes: a good idea on the Grid
PReP Formalisation • Abstract machines • Properties • Termination • Liveness • Safety • Foundation for adding necessary cryptographic techniques
Research Agenda (1) • In order for provenance data to be useful, we expect such a protocol to support some “classical” properties of distributed algorithms. • Using mutual authentication, an invoked service can ensure that it submits data to a specific provenance server, and vice-versa, a provenance server can ensure that it receives data from a given service. • With non-repudiation, we can retain evidence of the fact that a service has committed to executing a particular invocation and has produced a given result. • We anticipate that cryptographic techniques will be useful to ensure such properties
Research Agenda (2) • Access control • Medical applications: organ transplant, IXI, e-Diamond • Scalability • DC2 10^7 files, CERN envision 10^12 files • From execution level provenance, how to infer domain level provenance.
Research Agenda (3) Using provenance of data, trust metrics of the data can be derived from: • Trust the user places in invoked services • Trust the user places in the input data • Trust the user places in the enacted workflow • Trust the user places in the enactor • Trust the user places in the provenance service.
The purpose of project PASOA to investigate provenance in Grid architectures • Funded by EPSRC under the “fundamental computer science for e-Science call” • In collaboration with Cardiff • www.pasoa.org
EU Provenance STREP: Enabling and Supporting Provenance in Grids for Complex Problems • Partners • IBM United Kingdom Ltd • University of Southampton • German Aerospace Centre • University of Wales, Cardiff • Universitat Politecnica de Catalunya • MTA SZTAKI • To design, conceive and implement an industrial-strength open provenance architecture for Grid computing, and to deploy and evaluate it in complex grid applications (aerospace engineering and organ transplant management) • www.gridprovenance.org
Functional Final Pre Prototype Prototype Prototype Architecture 1 Architecture 2 Standardisation (Interfaces) Proposal (Strawman) Scalability specification Requirements Security Specification Tools Domain Specific Specification 1 Application 1 Domain Specific Specification 2 Application 2 Provenance Workplan
Conclusion • Provenance is a rather unexplored domain • Strategic to bring trust in open environment • Necessity to design a secure, scalable and configurable architecture capable of supporting multiple requirements from very different application domains • Need to further investigate the algorithmic foundations of provenance, which will lead to scalable and secure industrial solutions • Deployment in real applications
Acknowledgements • myGrid • Simon Miles, Juri Papay, Ananth Krishna, Michael Luck, David De Roure, Terry Payne, Mark Greenwood, Carole Goble, Martin Szomszor • Combechem • Gareth Hughes, Hugo Mills, monica schraeffel • PASOA • Omer Rana, Paul Groth, Simon Miles, Ben Caroll • EU-Provenance • Syd Chapman, John Ibbotson, Laszlo Varga, Steve Willmott, Ulises Cortes, Andreas Schreiber, Rolf Hempel