200 likes | 331 Views
Provenance in my Grid Jun Zhao. School of Computer Science The University of Manchester, U.K. 21 October, 2004. Outline. my Grid Motivation Challenges my Grid approach Related work Conclusions. myGrid Project. http://www.mygrid.co.uk A pilot e-Science project in U.K.;
E N D
Provenance in myGridJun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004
Outline • myGrid • Motivation • Challenges • myGrid approach • Related work • Conclusions
myGrid Project • http://www.mygrid.co.uk • A pilot e-Science project in U.K.; • Target at biologists and bioinformatician; • Three bio-test beds: • Providing middleware services in a Grid environment, which are orchestrated in the mechanism of workflows;
Forming experiments Personalisation Discovering and reusing experiments and resources Executing and monitoring experiments Managing lifecycle, provenance and results of experiments Sharing services & experiments Soaplab e-Science in silico Experiments(workflows) • Automate the process of experiments; • Orchestrate distributed resources and Web/Grid services; • Transparent, seamless access to remote data and computation resources • Increase the collaboration and results sharing across multi-scale communities
Problems when doing in silico experiments Experiments being performed repeatedly, at different sites, different times, by different users or groups; • A large repository of zipped records about experiments!! • frequently updated resources; • volatile, distributed environment Scientists
verification of data; • “recipes” for experiment designs; • explanation for the impact of changes; • ownership; • performance of services; • data quality; PROVENANCE Problems when doing in silico experiments Experiments being performed repeatedly, at different sites, different times, by different users or groups; Scientists
mass = 200 decay = bb mass = 200 decay = ZZ mass = 200 mass = 200 decay = WW stability = 1 LowPt = 20 HighPt = 10000 mass = 200 decay = WW stability = 3 mass = 200 decay = WW mass = 200 decay = WW stability = 1 mass = 200 event = 8 mass = 200 decay = WW stability = 1 event = 8 mass = 200 decay = WW event = 8 mass = 200 plot = 1 mass = 200 decay = WW stability = 1 plot = 1 mass = 200 decay = WW plot = 1 Provenance Forms • Derivations • A workflow log. • Linking items, in a directed graph. • when, who, how, which, what, where • Execution Process-centric • Annotations • Attached to items or collections of items, in a structured, semi-structured or free text form. • Annotations on one item or linking items. • why, when, where, who, what, how. • Data-centric
Challenges • cross-referencing across runs and within experiment; • Provenance of *good* metadata annotation • Bridging provenance islands • Moreover….
Iterative service State controls Experiment design file Experiment run with interactions Experiment run with failures Revised experiment Challenges: Complex cross-referencing information • Complex control flow • Iterative data and process flow • Repetitive running producing cross-referencing information • human interaction activities v.s. service invocations • Service failure and experiment re-composition
Challenges • Annotations: • Mandatory / automatic • Who did that • How much should be trusted • Security control • Authenticity validation • Quality • Cross-referencing • Versioning
Challenges: provenance islands Diverse information Diverse information Service 1 Service 2 Workflow 1 Experimental Investigation 1 Data 1 Diverse metadata of information Diverse metadata of information
Moreover • Intellectual property • Preservation • Archiving • Query and access • Integration • Investigation • Impact analysis • ……
myGrid Approach • Taverna workflow workbench • Provenance plug-in; • mIR(myGrid Information Repository) plug-in; • myGrid information model • Based on CCLRC scientific metadata model • Providing shared model for services and components interactions • Semantic Web technologies • RDF (Resource Description Framework) • Ontologies • LSIDs and URNs http://taverna.sourceforge.net http://freefluo.sourceforge.net B. Matthews and S. Sufi: The CLRC ScientificMetadata Model, version 1, DL TR02001, CLRC, February 2001
RDF in a Nutshell • Resource Description Language • Common model for metadata • A graph of triples • <subject, predicate, object> • RDQL, repositories, integration tools, presentation tools • Jena, Haystack http://www.w3.org/RDF/
Organisation level provenance Process level provenance Service runBye.g. BLAST @ NCBI project Experiment design Process componentProcesse.g. web service invocation of BLAST @ NCBI Workflow design partOf Event instanceOf componentEvente.g. completion of a web service invocation at 12.04pm Workflow run hasOutput hasInput Data/ knowledge level provenance knowledge statementse.g. similar protein sequence to run for User can add templates to each workflow process to determine knowledge links between data items. Data Person subClass Organisation Blast Result DNA sequence data derivation e.g. output data derived from input data
Representing links http://www.mygrid.org.uk/ontology#derived_from urn:lsid:taverna.sf.net:datathing:45fg6 urn:lsid:taverna.sf.net:datathing:23ty3 • Identify link type • Again use URI • Allows us to use RDF infrastructure • Repositories • Ontologies
LSID for GenBank Data Provenance Web Personalization view
Reflection • First attempt • Bridging the island • Provenance modelling: relational + schema-less model • Provenance collection • Moreover: • Provenance slicing • Security control • Authenticity validation • Provenance versioning and (long-time) preservation
Related Work • Chimera: • Provenance cross-referencing • www.griphyn.org/chimera/ • CombeChem: • www.combechem.org/ • PASOA (Provenance Aware Service Oriented Architecture) • http://twiki.pasoa.ecs.soton.ac.uk/bin/view/PASOA/WebHome • CMCS(Collaboratory for Multi-Scale Chemical Science) • http://cmcs.ca.sandia.gov/index.php • ESSW (Earth System Science Workbench) • http://essw.bren.ucsb.edu/
Acknowledgement • myGrid team: • esp. Carole Goble, Robert Stevens, Chris Wroe, Mark Greenwood, Phil Lord • IBM: • Dennis Quan • Williams Group • Esp. Hannah Tipney