300 likes | 508 Views
myGrid/Taverna Provenance. Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06. Components. Identifiers LSIDs Data JDBC data store Metadata RDF Provenance Plugin Browsing Provenance Browser Plugin Security Under development. LSID.
E N D
myGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06
Components • Identifiers • LSIDs • Data • JDBC data store • Metadata • RDF Provenance Plugin • Browsing • Provenance Browser Plugin • Security • Under development
LSID: Life Science Identifier • URN specification in progress • 5 part identifier (with optional version id) • urn:lsid:www.mygrid.org.uk:lsdocument:X1234 • urn:lsid:ncbi.nlm.nlh.gov.lsid.biopathways.org:genbank_gi:7717376 • protocol for retrieving data and metadata about an object • commitment by the provider to always return the same data for an ID
LSID (ctd) • Issue • LSID Authorities • Resolution • LSID Resolvers • Examples • myGrid • Long Term Ecological Research Network • BioPathways Consortium
LSID (ctd 2) • abstraction • lightweight • independent from actual storage implementation • database • file system • application • both for private and public data sources
Data Storage (current) • Taverna can persist inputs, outputs and intermediate results in an SQL database via JDBC • Optional and can be done by configuring a Baclava Data Store • Allows the LSIDs of data items to be resolved against the actual data
Data Storage (future) • Domain-specific databases • use outside myGrid • Develop: • taverna processor for JDBC/OGSA-DAI • associated interface (cf BioMart) • Users will be able to study the contents of an existing database and: • write queries that extract data from the database, where the query may be parameterised with values passed in from the workflow; • write requests that insert data from the workflow into a named table in the database.
Metadata Generation • Taverna Provenance Plugin • Listen to Taverna Events • WorkflowEventListener • Faithfully record them as ontological instance data • RDF graphs (one for each Taverna run)
Metadata • Representation • Ontology (Schema) • Storage • Query • Browsing
Representation • RDF • triples • subject –predicate object • URIs (hence easy data integration) • semantic web language • XML serialization • flexible, powerful • sets of triples gives rise to graphs
Workflow Run urn:lsid:…:workflow:6 urn:lsid:…:org:HY7 runs belongsTo urn:lsid:..:wfInstance:8 launchedBy urn:lsid:…:person:4 executed executed urn:lsid:…:processRun:84 urn:lsid:…:processRun:51
Schema • Ontology • RDF schema • Taxonomic inferences • also available as OWL • opens it up to complex reasoning
Typed Workflow Run launchedBy Provenance Ontology executed WorkflowRun Workflow ProcessRun Experimenter Organization belongsTo runs urn:lsid:…:workflow:6 urn:lsid:…:org:HY7 runs belongsTo urn:lsid:..:wfInstance:8 launchedBy urn:lsid:…:person:4 executed executed urn:lsid:…:processRun:84 urn:lsid:…:processRun:51
Storage • Named RDF graphs • retrieve whole graphs (eg workflows) • implementation in • NG4J (Jena + MySQL) • scalability issues • Sesame2 native store • scalable • Java 5
Query • RDF query languages • TriQL, SeRQL, SPARQL • query languages for named RDF graphs • Ontology inspection/reasoning • Canned Queries • workflows with failed processes • input/output of past process runs • workflows with data changed by user
Provenance Browsing • Provenance Browser Plugin • reusing Taverna GUI components • Matthew Gamble
Provenance Analysis • Comparison • Aggregation • etc • see work by Jun Zhao
User sends LSID ref and credentials to the Access Point • Access Point returns data and metadata or denies access as follows: • credentials are passed to a User Directory • User Directory passes the corresponding user to the Authorization Authority • Authorization Authority returns the user attributes in the form of a (possibly signed) SAML assertion • this assertion, together with the lsid and its corresponding metadata, is passed to the Policy Enforcement Point (PEP) • PEP uses these three inputs to form an XACML request that is passed to a Policy Decision Point (PDP) that is preloaded with an XACML Policy Set. • PDP evaluates the request against its policy set and returns an XACML response to PEP • PEP decodes the response and either allows data/metadata to be returned to the user or denies access.
myGrid XACML Policy • Scenario • supervisors can access all workflows in the organization • students can access only their own workflows • blacklisted users cannot access anything • See policySet.xml on myGrid wiki