1 / 30

myGrid/Taverna Provenance

myGrid/Taverna Provenance. Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06. Components. Identifiers LSIDs Data JDBC data store Metadata RDF Provenance Plugin Browsing Provenance Browser Plugin Security Under development. LSID.

makana
Download Presentation

myGrid/Taverna Provenance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. myGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06

  2. Components • Identifiers • LSIDs • Data • JDBC data store • Metadata • RDF Provenance Plugin • Browsing • Provenance Browser Plugin • Security • Under development

  3. LSID

  4. LSID: Life Science Identifier • URN specification in progress • 5 part identifier (with optional version id) • urn:lsid:www.mygrid.org.uk:lsdocument:X1234 • urn:lsid:ncbi.nlm.nlh.gov.lsid.biopathways.org:genbank_gi:7717376 • protocol for retrieving data and metadata about an object • commitment by the provider to always return the same data for an ID

  5. LSID (ctd) • Issue • LSID Authorities • Resolution • LSID Resolvers • Examples • myGrid • Long Term Ecological Research Network • BioPathways Consortium

  6. LSID (ctd 2) • abstraction • lightweight • independent from actual storage implementation • database • file system • application • both for private and public data sources

  7. Data

  8. Data Storage (current) • Taverna can persist inputs, outputs and intermediate results in an SQL database via JDBC • Optional and can be done by configuring a Baclava Data Store • Allows the LSIDs of data items to be resolved against the actual data

  9. Data Storage (future) • Domain-specific databases • use outside myGrid • Develop: • taverna processor for JDBC/OGSA-DAI • associated interface (cf BioMart) • Users will be able to study the contents of an existing database and: • write queries that extract data from the database, where the query may be parameterised with values passed in from the workflow; • write requests that insert data from the workflow into a named table in the database.

  10. Metadata

  11. Metadata Generation • Taverna Provenance Plugin • Listen to Taverna Events • WorkflowEventListener • Faithfully record them as ontological instance data • RDF graphs (one for each Taverna run)

  12. Metadata • Representation • Ontology (Schema) • Storage • Query • Browsing

  13. Representation • RDF • triples • subject –predicate object • URIs (hence easy data integration) • semantic web language • XML serialization • flexible, powerful • sets of triples gives rise to graphs

  14. Workflow Run urn:lsid:…:workflow:6 urn:lsid:…:org:HY7 runs belongsTo urn:lsid:..:wfInstance:8 launchedBy urn:lsid:…:person:4 executed executed urn:lsid:…:processRun:84 urn:lsid:…:processRun:51

  15. Schema • Ontology • RDF schema • Taxonomic inferences • also available as OWL • opens it up to complex reasoning

  16. Typed Workflow Run launchedBy Provenance Ontology executed WorkflowRun Workflow ProcessRun Experimenter Organization belongsTo runs urn:lsid:…:workflow:6 urn:lsid:…:org:HY7 runs belongsTo urn:lsid:..:wfInstance:8 launchedBy urn:lsid:…:person:4 executed executed urn:lsid:…:processRun:84 urn:lsid:…:processRun:51

  17. Storage • Named RDF graphs • retrieve whole graphs (eg workflows) • implementation in • NG4J (Jena + MySQL) • scalability issues • Sesame2 native store • scalable • Java 5

  18. Query • RDF query languages • TriQL, SeRQL, SPARQL • query languages for named RDF graphs • Ontology inspection/reasoning • Canned Queries • workflows with failed processes • input/output of past process runs • workflows with data changed by user

  19. Browsing

  20. Provenance Browsing • Provenance Browser Plugin • reusing Taverna GUI components • Matthew Gamble

  21. Analysis

  22. Provenance Analysis • Comparison • Aggregation • etc • see work by Jun Zhao

  23. Security

  24. User sends LSID ref and credentials to the Access Point • Access Point returns data and metadata or denies access as follows: • credentials are passed to a User Directory • User Directory passes the corresponding user to the Authorization Authority • Authorization Authority returns the user attributes in the form of a (possibly signed) SAML assertion • this assertion, together with the lsid and its corresponding metadata, is passed to the Policy Enforcement Point (PEP) • PEP uses these three inputs to form an XACML request that is passed to a Policy Decision Point (PDP) that is preloaded with an XACML Policy Set. • PDP evaluates the request against its policy set and returns an XACML response to PEP • PEP decodes the response and either allows data/metadata to be returned to the user or denies access.

  25. myGrid XACML Policy • Scenario • supervisors can access all workflows in the organization • students can access only their own workflows • blacklisted users cannot access anything • See policySet.xml on myGrid wiki

More Related