270 likes | 435 Views
Taverna. Adding science to eScience Tom Oinn, tmo@ebi.ac.uk 6 th March 2004. What is Taverna?. A collection of Java APIs, XML and RDF Schema, Languages and Java Applications. A part of the EPSRC myGrid project.
E N D
Taverna Adding science to eScience Tom Oinn, tmo@ebi.ac.uk 6th March 2004
What is Taverna? • A collection of Java APIs, XML and RDF Schema, Languages and Java Applications. • A part of the EPSRC myGrid project. • Collectively aimed at facilitating standard scientific procedures in the eScience domain, especially in workflow systems. • Reproducibility, Data Provenance and Process Comprehension and Dissemination
Organisation • Open source (LGPL) and hosted on sourceforge.net. • Just over a year old as a distinct project. • Growing community of both users and developers. • Coordinated by an ad hoc combination of email, face to face meetings, access grid and beer.
Philosophy • We are trying to build something that works now. • Incorporate new technologies only where they are directly useful. • Assume an open world of services, most of which we do not control directly. • Drive development primarily from user requirements and requests. • Release often, try to build a community.
Availability • Website at http://taverna.sf.net • Developer access by SSH+CVS • Anonymous CVS • Regular binary and source releases particularly for MS Windows allowing a ‘download and run’ distribution • Taverna at beta8, Ouzo (of which more later) at beta1
Taverna API • Acts as an intermediate layer between user level applications and workflow enactors such as FreeFluo. • Includes object models using a standard MVC design for both workflow definitions and data objects within a workflow. • Used by the Taverna Workbench, DataThing viewer, workflow portal etc...
XScufl Workflow Language • SCUFL is the Simple Conceptual Unified Flow Language • myGrid originally based on WSFL… • …but no available editors, editing a simple workflow by hand was tedious and error prone. • SCUFL provides a much higher level view on workflows, and therefore simpler to write by hand.
SCUFL features • Simple – relies upon an inherently connected environment to reduce the quantity of information explicitly stated in the workflow definition. • No port definitions in XScufl • Processor metadata intelligently gathered from underlying sources i.e. WSDL, Soaplab • Allows optional typing information, can specify as little or as much as is available
Conceptual – one Processor in a SCUFL workflow maps as far as is possible to one conceptual operation as viewed by a non expert user • Wrap up stateful service interactions into custom Processor implementations • Lowers the barrier preventing experts in other domains such as bioinformatics entering or using eScience
Unified Flow Language – SCUFL does not dictate how the workflow is to be enacted, it is inherently declarative in intent. • Can potentially be translated to other workflow languages. • Can be arbitrarily abstract, any given workflow engine may require further definition of the language before it can be enacted.
Taverna Workbench • In the first iteration, a demonstrator and test bed for the various view components of the Taverna API. • Now in its eighth release it has become a powerful and at least partially user friendly tool for building or editing workflows. • In use in the wild, many known users and probably more ones who haven’t told us!
Taverna Features • Unsurprisingly, Taverna+FreeFluo can enact workflows. Taverna adds further value to the enactor over and above this basic functionality. • Implicit iteration support • Result browsing and data encapsulation • Provenance recording based on semantic web technologies and LSID • Fault tolerance features
Implicit Iteration • A computer scientist would say that putting a String[] into a String doesn’t work. She would, of course, be correct. • Non computer scientists may take a different view, arguing that it makes sense that if something can process a String then it should just run multiple times on a String[]. • Our users are mostly not computer scientists. • Taverna tries to behave the way the non CS person expects, hiding the magic as it does.
Data Encapsulation • Workflow engines need a limited understanding of their data in order to allow features such as implicit iterators. • They do not, however, require any more than this, and should be otherwise agnostic to the data flowing through the workflow. • Taverna includes a DataThing class, which can be tagged with terms from ontologies, free text descriptions and MIME types, and which may contain arbitrary collection structures.
Data Types, Result Browsing • Using the metadata hints contained within a DataThing object we can locate and launch pluggable view components. • Hybrid typing scheme allows for a ‘best effort’ approach to data typing. • Required because life science types are intractable for reasonable effort or completeness.
Provenance, RDF, LSID • Providing computation access to services creates new challenges, workflow technology amplifies them further. • Potentially complex result data in terms of derivation. • Scientists need to be able to show how a given result in these data is arrived at. • Metadata about the results is as important as the result values themselves.
Overall Metadata Infrastructure Workflow server Clients DataThing viewer Taverna Web browser Haystack LSID Launch pad Ouzo API (client) Ouzo API (server) LSID Authority mysql LSID Authority / Data service
LSID Launchpad (IBM) Launchpad is an application that sits inside MS Windows and allows links to LSIDs to be resolved as if they were local or normal web page type addresses. This mechanism could be used to allow Taverna to email the user once their workflow completes, the email containing such links which would then allow the user to browse the data and associated metadata from their desktop.
LSID and RDF • LSID provides a uniform naming scheme. • This naming system allows us to make unambiguous statements that may then be reasoned over programmatically. • RDF allows us to extend base relations i.e. ‘is derived from’, ‘created by’ with domain specific ones i.e. ‘is predicted structure of’. • These additional metadata are expressed as templates attached to processors in the workflow, could come from a variety of sources.
Fault Tolerance • In an open service world, we have no control over the majority of analysis services. • Such services may fail, become inaccessible or their APIs change with no notice. • Taverna allows configurable failure handling including dynamically rescheduling processors with alternate implementations.
scheduled and waiting for data aborted data ready types match can iterate data mismatch invoking constructing iterator creating alternate processor instantiation error aborted done iterating waiting to retry error timeout success complete aborted invoking with implicit iteration retries left adding item to result data set waiting to retry error timeout success alternate available retries left service failure allow partials
Fault Tolerance Editing Retry, delay and backoff configuration Alternate Processor
Summary – Taverna and eScience • Standard workflow language allows peer review and publication of eScience methods. • LSID allows universal access to results for collaboration, as well as for review. • RDF+LSID explains the context of these results and provides guidance for further investigations.