190 likes | 305 Views
Provenance Aware Service Oriented Architecture (1 year on) www.pasoa.org. Professor Luc Moreau University of Southampton L.Moreau@ecs.soton.ac.uk. The PASOA Team. PASOA Southampton Simon Miles, Paul Groth, Miguel Branco, Luc Moreau PASOA Cardiff
E N D
Provenance Aware Service Oriented Architecture (1 year on) www.pasoa.org Professor Luc Moreau University of Southampton L.Moreau@ecs.soton.ac.uk
The PASOA Team • PASOA Southampton • Simon Miles, Paul Groth, Miguel Branco, Luc Moreau • PASOA Cardiff • Ian Wootten, Shrija Rajbhandari, Omer Rana, David Walker
Provenance Definition • Merriam-Webster Online dictionary: • the origin, source; • the history of ownership of a valued object or work of art or literature • The provenance of a piece of data is the process that led to the data • Our aim is to conceive a computer-based representation of provenance that allows us to perform useful analysis and reasoning to support our use cases
Provenance Use Cases (1) Bioinformatics: verification of “experiment validity”. High Energy Physics: tracking, analysing, verifying data sets in the ATLAS Experiment of the Large Hadron Collider (CERN)
Provenance Use Cases (2) Aerospace engineering: maintain a historical record of design processes, up to 99 years. Organ transplant management: tracking of previous decisions, crucial to maximise the efficiency in matching and recovery rate of patients
The Provenance Problem Given a set of services in an open grid environment that are composed in order to produce a given result; How can we determine the process that generated the result? (especially after their composition, i.e., virtual organisation, has been disbanded)
Application Results Record Documentation of Execution Provenance “Lifecycle” Core Interfaces to Provenance Store Provenance Store Query Provenance of Data Manage Store and its contents
Logical Architecture Adopted by EU Provenance as strawman [Miles et al. 05]
PReP[Groth et al. 04] Protocol adopted by application components Allow for multiple provenance stores (scalability) Query Interface [Miles et al.05] Purpose Obtain the provenance of some specific data Allow for “navigation” of the data structure representing provenance Abstract interface Allows us to view the provenance store as if containing XML data structures Based on XPath and XQuery invocation client service result invocation and result recording invocation and result recording Provenance Store Provenance Store Recording & Querying
Assertions about Performance and Availability [Wootten, Rana 05] • A taxonomy of gathered information about performance • Recorded (invocation start/end time and counts) • Derived from Recorded Information (averages) • Queried against other actor owned metrics • Compilation of assertions in a measure of trust (both from service and client perspective) Trust is a subjective probability that an actor will perform a particular action [Gambetta] [Rajbhandari, Rana 05]
WS Client Web Service Axis Handler Axis Handler PS Client Side Library PS Client Side Library Provenance Service WS Calls Java Calls Backend Store Interface PS Client Side Library In-Memory Store File System Store Query Actor WS … Backend Stores PReServ [Groth et al. 05] • Implementation of PReP protocol and Query Interface • Provenance store implemented as a Web Service • Client side libraries for using Provenance Store • Axis Handler for automatically recording communication between Axis-based Web Services
Bioinformatics Application • Bioinformatics workflow studying compressibility of biological sequences • Implemented as a VDT workflow, scheduled by Condor • Each service, script, command records provenance [HPDC’05]
Bioinformatics Application (2) • Use Cases • Algorithm verification • A bioinformatician, A, downloads a protein sequence from the RefSeq database and runs the compressibility experiment. • A later performs the same experiment on the same sequence data, again downloaded from RefSeq. • A compares the two experiment results and notices a difference. • A determines whether the difference was caused by the algorithms used to process the sequence data having been changed.
Recording Scalability Querying Scalability Bioinformatics Application (3)
EU Provenance project Pre-prototype about baking cakes e-Demand Detect sharing of services in workflow execution to offer more resilient execution Other Applications [Townend etal 05] [Xu et al 05]
Conclusions • Mostly unexplored area that is crucial to develop trusted systems • Current work: • System and protocol designing, architecture specification, generic support for use cases • Pursue the deployment in concrete application and performance evaluation • Download our software from www.pasoa.org • Tell us about your use cases: we are keen to find new collaborations in this space!
Paul Groth, Simon Miles, Weijian Fang, Sylvia C. Wong, Klaus-Peter Zauner, and Luc Moreau. Recording and Using Provenance in a Protein Compressibility Experiment. In Proceedings of the 14th IEEE International Symposium on High Performance Distributed Computing (HPDC'05), July 2005. Paul T. Groth. Recording Provenance in Service-Oriented Architectures. 9 Month Report, University of Southampton; Faculty of Engineering, Science and Mathematics; School of Electronics and Computer Science, 2004. Paul Groth, Michael Luck, and Luc Moreau. A protocol for recording provenance in service-oriented Grids. In Proceedings of the 8th International Conference on Principles of Distributed Systems (OPODIS'04), Grenoble, France, December 2004. Paul Groth, Michael Luck, and Luc Moreau. Formalising a protocol for recording provenance in Grids. In Proceedings of the UK OST e-Science second All Hands Meeting 2004 (AHM'04), Nottingham, UK, September 2004. Simon Miles, Paul Groth, Miguel Branco, and Luc Moreau. The requirements of recording and using provenance in e-Science experiments. Technical report, University of Southampton, 2005. Luc Moreau, Syd Chapman, Andreas Schreiber, Rolf Hempel, Omer Rana, Lazslo Varga, Ulises Cortes, and Steven Willmott. Provenance-based Trust for Grid Computing --- Position Paper. In , 2003. Paul Townend, Paul Groth, and Jie Xu. A Provenance-Aware Weighted Fault Tolerance Scheme for Service-Based Applications. In Proc. of the 8th IEEE International Symposium on Object-oriented Real-time distributed Computing (ISORC 2005), May 2005. Publications