1 / 20

Provenance in my Grid Jun Zhao

Provenance in my Grid Jun Zhao. School of Computer Science The University of Manchester, U.K. 21 October, 2004. Outline. my Grid Motivation Challenges my Grid approach Related work Conclusions. myGrid Project. http://www.mygrid.co.uk A pilot e-Science project in U.K.;

amandla
Download Presentation

Provenance in my Grid Jun Zhao

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Provenance in myGridJun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004

  2. Outline • myGrid • Motivation • Challenges • myGrid approach • Related work • Conclusions

  3. myGrid Project • http://www.mygrid.co.uk • A pilot e-Science project in U.K.; • Target at biologists and bioinformatician; • Three bio-test beds: • Providing middleware services in a Grid environment, which are orchestrated in the mechanism of workflows;

  4. Forming experiments Personalisation Discovering and reusing experiments and resources Executing and monitoring experiments Managing lifecycle, provenance and results of experiments Sharing services & experiments Soaplab e-Science in silico Experiments(workflows) • Automate the process of experiments; • Orchestrate distributed resources and Web/Grid services; • Transparent, seamless access to remote data and computation resources • Increase the collaboration and results sharing across multi-scale communities

  5. Problems when doing in silico experiments Experiments being performed repeatedly, at different sites, different times, by different users or groups; • A large repository of zipped records about experiments!! • frequently updated resources; • volatile, distributed environment Scientists

  6. verification of data; • “recipes” for experiment designs; • explanation for the impact of changes; • ownership; • performance of services; • data quality; PROVENANCE Problems when doing in silico experiments Experiments being performed repeatedly, at different sites, different times, by different users or groups; Scientists

  7. mass = 200 decay = bb mass = 200 decay = ZZ mass = 200 mass = 200 decay = WW stability = 1 LowPt = 20 HighPt = 10000 mass = 200 decay = WW stability = 3 mass = 200 decay = WW mass = 200 decay = WW stability = 1 mass = 200 event = 8 mass = 200 decay = WW stability = 1 event = 8 mass = 200 decay = WW event = 8 mass = 200 plot = 1 mass = 200 decay = WW stability = 1 plot = 1 mass = 200 decay = WW plot = 1 Provenance Forms • Derivations • A workflow log. • Linking items, in a directed graph. • when, who, how, which, what, where • Execution Process-centric • Annotations • Attached to items or collections of items, in a structured, semi-structured or free text form. • Annotations on one item or linking items. • why, when, where, who, what, how. • Data-centric

  8. Challenges • cross-referencing across runs and within experiment; • Provenance of *good* metadata annotation • Bridging provenance islands • Moreover….

  9. Iterative service State controls Experiment design file Experiment run with interactions Experiment run with failures Revised experiment Challenges: Complex cross-referencing information • Complex control flow • Iterative data and process flow • Repetitive running producing cross-referencing information • human interaction activities v.s. service invocations • Service failure and experiment re-composition

  10. Challenges • Annotations: • Mandatory / automatic • Who did that • How much should be trusted • Security control • Authenticity validation • Quality • Cross-referencing • Versioning

  11. Challenges: provenance islands Diverse information Diverse information Service 1 Service 2 Workflow 1 Experimental Investigation 1 Data 1 Diverse metadata of information Diverse metadata of information

  12. Moreover • Intellectual property • Preservation • Archiving • Query and access • Integration • Investigation • Impact analysis • ……

  13. myGrid Approach • Taverna workflow workbench • Provenance plug-in; • mIR(myGrid Information Repository) plug-in; • myGrid information model • Based on CCLRC scientific metadata model • Providing shared model for services and components interactions • Semantic Web technologies • RDF (Resource Description Framework) • Ontologies • LSIDs and URNs http://taverna.sourceforge.net http://freefluo.sourceforge.net B. Matthews and S. Sufi: The CLRC ScientificMetadata Model, version 1, DL TR02001, CLRC, February 2001

  14. RDF in a Nutshell • Resource Description Language • Common model for metadata • A graph of triples • <subject, predicate, object> • RDQL, repositories, integration tools, presentation tools • Jena, Haystack http://www.w3.org/RDF/

  15. Organisation level provenance Process level provenance Service runBye.g. BLAST @ NCBI project Experiment design Process componentProcesse.g. web service invocation of BLAST @ NCBI Workflow design partOf Event instanceOf componentEvente.g. completion of a web service invocation at 12.04pm Workflow run hasOutput hasInput Data/ knowledge level provenance knowledge statementse.g. similar protein sequence to run for User can add templates to each workflow process to determine knowledge links between data items. Data Person subClass Organisation Blast Result DNA sequence data derivation e.g. output data derived from input data

  16. Representing links http://www.mygrid.org.uk/ontology#derived_from urn:lsid:taverna.sf.net:datathing:45fg6 urn:lsid:taverna.sf.net:datathing:23ty3 • Identify link type • Again use URI • Allows us to use RDF infrastructure • Repositories • Ontologies

  17. LSID for GenBank Data Provenance Web Personalization view

  18. Reflection • First attempt • Bridging the island • Provenance modelling: relational + schema-less model • Provenance collection • Moreover: • Provenance slicing • Security control • Authenticity validation • Provenance versioning and (long-time) preservation

  19. Related Work • Chimera: • Provenance cross-referencing • www.griphyn.org/chimera/ • CombeChem: • www.combechem.org/ • PASOA (Provenance Aware Service Oriented Architecture) • http://twiki.pasoa.ecs.soton.ac.uk/bin/view/PASOA/WebHome • CMCS(Collaboratory for Multi-Scale Chemical Science) • http://cmcs.ca.sandia.gov/index.php • ESSW (Earth System Science Workbench) • http://essw.bren.ucsb.edu/

  20. Acknowledgement • myGrid team: • esp. Carole Goble, Robert Stevens, Chris Wroe, Mark Greenwood, Phil Lord • IBM: • Dennis Quan • Williams Group • Esp. Hannah Tipney

More Related