200 likes | 249 Views
This report evaluates open-source WfMS (Kepler, Taverna, Triana) based on scientific workflow patterns. Experience from references and patterns analysis are included. Various control, data, and scientific workflow patterns are discussed along with additional requirements and the introduction of Kepler, Taverna, and Triana tools. The study concludes with insights on the strengths and limitations of each system.
E N D
Comparison of Scientific WfMS(Workflow Management Systems) B.Guillerminet IM Design Team, CEA IRFM 8 June 2011
Outline • Introduction • Summary • References • Types of WfMS • Models of Computations • Business WfMS • Scientific WfMS • Comparison criteria • Patterns: Control, Data, Scientific workflow • Additionnal requirements • Introduction to Kepler • Introduction to Taverna • Introduction to Triana • Results • Workflow control patterns • Workflow data patterns • Scientific workflow patterns • Conclusions
Introduction • Summary • We report an evaluation (ref[1], Feb 2011) of three well-known open source WfMS (Kepler, Taverna and Triana) based on scientific workflow patterns. Experience and comments are also coming from ref 2-7. • References • “Pattern based evaluation of scientific workflow management systems”, Sara Migliorini, Mauro Gambini,Marcello La Rosa, Arthur H.M. ter Hofstede, Feb 2011 • “Workflow control-flow patterns: A Revised View”, Nick Russell, Arthur H.M. ter Hofstede, Wil M.P. van der Aalst, Nataliya Mulyar (http://www.workflowpatterns.com/evaluations/index.php) • “Scientific workflow system – can one size fit all?”, V Curcin, M. Ghanem, IEEE 2008, CIBEC’08 • “Scaling up workflow-based applications”, S Callaghan et al., Journal of Computer and System Sciences 76 (2010), 428-446 • “Scientific workflow design for mere mortals”, T McPhillips et al., Future Generation Computer Systems 25 (2009), 541-551 • “Scientific Workflows: Business as Usual? ”, B Ludascher et al., • “Heterogeneous Composition of Models of Computation”, A. Goderis et al., Nov 2007
Types of WfMS • WfMS: • WfMS are not yet standardized => research activity: business, scientific, control • Meaning of this simple workflow? • Different models of computation (MoC) : • Control flow: “Automated data processing” Use Case Usual “Business” WfMS, DAG, arrow = execution order • Data flow: “Plasma simulation” Scientific WfMS, loops, // execution, arrow = data • Time flow: “Equation solver” Control WfMS, differential equations, arrow = time • State flow: #phases (init, time step …), Fault recovery… Command & control, machine operation, arrow = event/transition
Types of WfMS • Business WfMS • Control flow and shared data, sequential execution, DAG MoC • Staffware, WebSphere MQ, COSA, iPlanet, SAP Workflow, FileNet, FLOWer, BPMN, UML 2.0 Activity Diagrams, EPCs, BPEL4WS, WebSphere BPEL, Oracle BPEL and XPDL • Scientific WfMS • Data flow, data copied, parallel computation, support for GRID/HPC • Major Open Source Scientific WfMS: Kepler, Taverna, Triana
Comparison criteria • Workflow Control patterns • Basic Control Flow patterns • Execution: sequential, // split, synchronization • Exclusive choice: “if … then … else …” • Simple merge • Advanced branching & synchronization patterns • Multi merge, Multi choice • Discriminators: Structured, Blocking, Cancelling • Partial Join: Structured, Blocking, Cancelling • Multiple instances: join, … • Use case: launching several different ways of solving the pb and using the fastest path • State-based patterns • Deferred choice (list) • Interleaved // routing: task executed once and no two tasks can be executed at the same time • Milestone: a task is enabled when the process is in a particular state • Use case: checkpoint • Critical section: only one critical process can be active at any given time // branches but using only one
Comparison criteria • Workflow Control patterns (cont’d) • Cancellation and Force Completion patterns • Withdrawn an activity • Iteration patterns • Arbitrary cycles: (while …) • Structured loop: (do/for 1..n) • Recursion • Termination patterns • Implicit or explicit termination • Trigger patterns • Transient or persistent trigger: external signal activates a task
Comparison criteria • Workflow Data Patterns • Data visibility patterns: Private, shared data? • Task data: only accessible by the task • Block data, scope data: accessible by several tasks • Multiple instance data: new data/ each execution instance • Case data, folder data, workflow data: shared data • Environment data: • Examples: database connector, file identifier • Data interaction patterns: • Task to task, task to sub-workflow • To/from Multiple instance task • Environment to task and task to environment • Data transfer patterns: • By value, by copying, by reference • Data transformation: apply a transformation to the data prior or after being passed to the process • Data-based routing patterns: • Task pre & post condition: execution if data are … • Event-based task trigger: able to trigger a task (external environment) • Data-based task trigger: able to trigger a task (within the workflow) • Data-based routing: associated to a split
Comparison criteria • Scientific Workflow Patterns • Dynamic input size: number of input tokens is specified at run time • Use case: the number of input data varies from one set to another • Dynamic Token Replication: number of output tokens is specified at run time • Dynamic Balancing of Input Tokens: • Use case: different input rates (example: temperature T (every hour) and pressure (every 2 hours) are acquired at different rate => task is executed with new value of T and old p) • Cartesian product of input tokens: build all the possible combination of inputs • Example: [1,2,3] & [9,8,7] => [(1,9),(1,8),(1,7),(2,9) …] • Use case: cracking your password • Not addressed criteria • Catalogue of components: • Mathematical, Visualization, Database … • GRID, HPC support … • Different MoC and mixing them
Introduction to Kepler • Summary of Kepler characteristics: • Developers NSF-funded Kepler/CORE • UC Davis, UC Santa Barbara, and UC San Diego. • Parent project Ptolemy II • Evaluated Release 1.0.0 • Platforms Windows, Linux, Mac OS X • Development Language Java • Workflow Language MoML (XML-based) • License BSD License • Website http://kepler-project.org/ • Domain of application Physics, Ecosystems, Bioinformatics, Fusion (CPES, ITM) • Component = actor • Stateful = {init, (pre-fire,fire,post-fire), terminate} • I/O data = ports • Ontology: type checking at pre-init phase • External Models of Computation = Directors {DDF, PN, CT, FSM} • Mixing MoC but not every combinations are allowed
Introduction to Kepler • Example of Kepler workflow: • Adding T-uples
Introduction to Taverna • Summary of Taverna characteristics • Developers myGrid Team • University of Manchester, UK • Parent project myGrid • Evaluated Release 2.1 • Platforms Windows, Linux, Mac OS X • Development Language Java • Workflow Language Scufl • License LGPL • Website http://www.taverna.org.uk • Application domains Biology, Bioinformatics, Chemioinformatics • Astronomy, Social Sciences and Music • Component = processor • I/O data = data link • Coordination link for “Control flow” link without data • Internal fault management = {nb of retries, time-out, alternative service} • One MoC: DAG
Introduction to Taverna • Example of Taverna workflow • Concatenate 3 strings • Using “coordination link” to force a sequential execution • Black arrow are data flow link
Introduction to Triana • Summary of Triana characteristics • Developers Cardiff University • Parent project: - • Evaluated Release 4.0 • Platforms Windows, Linux, Mac OS X • Development Language Java • License Apache open source license version 2 • Website http://www.trianacode.org/ • Application domains Bioinformatics • Component = XML description (WSDL), Java code (local), Interface (remote) • One MoC = Data flow but • Trigger message for “Control flow” link without data
Introduction to Triana • Example of Triana workflow • Display the SQRT of a random number • Data flow
Workflow control patterns • Results • Basic Control Flow patterns • Ok for all • Advanced branching & synchronization patterns • Severe limitations due the absence of a mechanism for canceling running activities • State-based patterns: Kepler supports WCP 17 but … • Cancellation and Force Completion patterns: none • Iteration patterns • Triana and Kepler: ok but recursion • Not supported by Taverna • Termination patterns • Supported only by Kepler • Trigger patterns • None. Use case: external signal • Summary for control patterns • Kepler is the most powerful • Triana is close to Kepler • Several control patterns are missing in Taverna
Workflow data patterns • Results • Data visibility patterns: identical • Data interaction patterns: identical • Data transfer patterns: identical • Data-based routing patterns: • Taverna does not support this functionality due to the absence of “exclusive choice” (see WCP) • Summary for data patterns • Kepler & Triana are identical • Taverna is very close
Scientific Workflow patterns • Results • Dynamic input size: only Kepler • Dynamic Token Replication: only Kepler • Dynamic Balancing of Input Tokens: not supported by Kepler and partially by Triana and Taverna • Cartesian product of input tokens: only Taverna • Summary for Scientific workflow patterns • Triana is the less powerful • Kepler & Taverna have different specificities
Summary • “Kepler provides more functionalities than the other two systems” • “Taverna is compensated by the ease one can define a new processor” • “definition of a new component in Kepler requires a sophisticated programming skills (state + polymorphic behaviour to adapt to the chosen director)” • Real limitation of WfMS: // activities and waiting for only one completion