210 likes | 378 Views
Metadata Management in the Taverna Workflow System. Khalid Belhajjame 1 , Katy Wolstencroft 1 , Franck Tanoh 1 , Alan Williams 1 , Tom Oinn 2 and Carole Goble 1 1 University of Manchester Manchester, UK 2 EMBL European Bioinformatics Institute, Hinxton, UK. Outline. Taverna
E N D
Metadata Management in the Taverna Workflow System Khalid Belhajjame1, Katy Wolstencroft1, Franck Tanoh1, Alan Williams1, Tom Oinn2 and Carole Goble1 1University of Manchester Manchester, UK 2 EMBL European Bioinformatics Institute, Hinxton, UK
Outline • Taverna • Metadata for Describing Workflow Entities • Metadata for Describing Workflow Provenance • Metadata Curation • Applications
Taverna • Taverna allows scientists to access analysis tools hosted on a variety of different platforms in a unified fashion • It interoperates these tools through a dataflow model • Interoperate between : • Different scales – local machine, web service, grid • Different implementations – EGEE, Naregi etc. • Hide mechanics from the end user where possible
Dispatch logic Taverna 2 opens up the per-processor dispatch logic. Dispatch layers can ignore, pass unmodified, block, modify or act on any message and can communicate with adjacent layers. Each processor contains a single stack of arbitrarily many dispatch layers. Job Queue & Service List Single Job & Service List Single dispatch layer Dispatch layer composition allows for complex control flow within a given processor. DispatchLayer is an extensibility point. Use it to implement dynamic binding, caching, recursive behaviour…? DispatchLayer implementation Job specification messages from layer above Fault Message Result Message Data and error messages from layer below
Metadata “something” Metadata: data that describe other data to enhance its usefulness” Les Carr, Wendy Hall, Sean Bechhofer and Carole A. Goble: Conceptual linking: ontology-based open hypermedia. WWW 2001: 334-342 Metadata in Taverna • Metadata for describing workflow entities • What is the value added of a given workflow? • What is the task a given service performs? • What are the services that can be associated with a processor? • Metadata for describing workflow provenance • How did the execution of a given workflow go? • What this the semantics of a data product? • How many invocations of a given service failed?
Workflow Instance Process/ Workflow Data products
Process • Extend the provenance model to capture the internal behaviour of processors’ enactments
Data products • Data: There are four kinds of data entities that can be input/output by a workflow processor: literals, data documents composed of a collection of reference schemes, error documents and list of data entities.
Applications • Workflows artefacts • Guiding the design of workflows • Detecting and resolving Mismatches • Workflow discovery • Abstracting workflow specification • Provenance • Semantics based querying and browsing of workflow executions • Smart storage
Conclusions • Taverna • New extensions • Metadata • For describing workflow entities • For describing provenance • Applications • Example applications that will benefit from collected metadata
Mismatch Detection in Workflows Automatic detection of mismatches and support for retrieving the mapping appropriate for their correction
myExperiment.org • A community social network • A market place • A gateway to other publishing environments • A platform for launching workflows • Encapsulated myExperiment Objects • Mindful publication
Ontology Metadata for Describing Workflow Entities • Workflow entities • Workflow/subworkflow specifications • Processors that compose workflows • Services that performs the task • Encoding descriptions • Free text description or Keywords (tags) • Semantic concepts This workflow augments protein identification results with GO terms Protein identification Homology search Gene ontology term