180 likes | 373 Views
The design and implementation of a workflow analysis tool. Vasa Curcin Department of Computing Imperial College London. Scientific workflow field. Scientific workflows : a high-level programming language with explicit graphical representation of flow of data and/or control
E N D
The design and implementation of a workflow analysis tool Vasa Curcin Department of Computing Imperial College London
Scientific workflow field Scientific workflows: a high-level programming language with explicit graphical representation of flow of data and/or control Research into automation of processes supporting scientific research Significant role in providing middleware for UK eScience programme: Taverna, Discovery Net, Triana Lingua franca of service-oriented computing
Deluge of workflows Pegasus Meandre Taverna Discovery Net Triana Kepler Orange Pentaho KNIME Trident YAWL LONI GenePatterns Galaxy VisTrails UGENE Wildfire BPEL Cheminformatics Environmental Science Astronomy Sensor informatics Business Intelligence Bioinformatics …
Workflow analysis • There is a need for formal models to capitalize on the benefits of this infrastructure • Work evaluated on Discovery Net workflow • Concepts applicable to other workflow systems • Some aims • Minimise cost of data movement and processing • Provide technology for workflow clients and warehouses (indexing, guided construction…) • Tasks • Safeness • Instance bounds • Static workflow optimization • Establishing polymorphic type profiles of workflows
Underlying models • Control flow model • Process calculus definitions • Communication along named channels • Fixed for atomic execution, dynamic for streaming • New instance of the process launched as soon as the node receives a token • Computational tree logic modelling execution states • Data flow model • Nodes associated with lambda calculus formulas and term graphs • Polymorphic type transformations • Rewrite rules defined for sets of nodes as term graph transformations • Embedding • Way of combining the control and data semantics
Workflow analysis tool • Similarity checker • Bisimilarity of processes • Process profiler • Deadlock/livelock detection • Reachability • Task bounds • Composability checker • Design-time tests • Type requirements • Polymorphic properties • Equivalence checker • Functional equivalence • Optimizer • Rewrite rules for transformations
Similarity checker Model checker Process model Workflow • Based purely on the pi-calculus process model • Workflows translated into the process model • Parallel composition of independent node processes with named channels • Compared in terms of: • Internal executions (node actions) • Set of observable outputs - define only relevant outputs • Model checker used to test different types of bisimilarity • Node executions conveniently represented as silent actions • Strong bisimulation becomes strict one-to-one workflow action mapping • Weak bisimulation ignores internal actions and communications and focuses on visible outputs
Similarity checker: example • ABC (Another Bisimilarity Checker) used • Model checker used to test different types of bisimilarity • Node executions conveniently represented as silent actions • Strong bisimulation becomes strict one-to-one workflow action mapping • Weak bisimulation ignores internal actions and communications and focuses on visible outputs
Process profiling Kripke frame Process model Workflow • The process algebra representation translated into a Kripke frame • Enumerated states denoting the number of instances of each workflow node • Transitions of the frame are the node executions • Use CTL formulas to query • NuSMV model checker employed • Allows questions such as: • Reachability of a particular state • Detection of deadlocks and livelocks • Safety - some state always executing • Bounds on a number of instances of a node
Process profiling: example • Reachability • EF Fτ1– Is there an execution that achieves one instance of F • AF Fτ1– Do all executions always achieve one instance of F • Livelocks • AG (Cτ-> AG AF Cτ) – Is there always a livelock with C • EF (Cτ-> AG AF Cτ) – Can there be a livelock with C • Instance bounds • maxX .EFAτx– What is the maximum number of instances of A
Composability checker Type formulas Data model Workflow • Polymorphic type formulas for the workflow components/fragments • When composing: • The output and input of each fragment compared in terms of free and bound type variables • If no clashes, free variables resolved to form the type formula of the composition • Inference engine developed specifically for the tool • Determines: • If a workflow fragment can be reused on a new input • Find compatible services in the warehouse
Composability checker: example • Fragment of three nodes LMN • Input q, with required attributes A, B, D • Two outputs u, v • A present in both. B in u. D in neither. • Two outputs can be joined with O
Equivalence tester / optimizer Node equivalences Data model Workflow • Uses a set of node equivalence rules • Definedfor each workflow system or node subset • Algorithm applies allowed transformations to reduce two workflows to the same expression • Combined with rewrite heuristics • Node-specific again • Simple example: relational model again
Equivalence tester/optimizer: example • Relational workflow searching for Adverse Drug Reactions in GPRD database • Rewrite rules • Set of relational equivalences • Heuristics • Early projections/selections • Late joins • Easy scenario – brute force algorithm works
Related and future work • Data typing • COMAD for Kepler • Workflow process analysis • GWorkflowDL • YAWL • New workflow tools with relational structures • KNIME • Orange • Pentaho • Extensions: • Streaming – blocking and batching • Improved state reduction algorithms for CTL model • Adding more type constructs for polymorphism
Summary • Workflow analysis needed to improve takeup and exploitation of workflows • Enterprise environments • Profile resource usage, risk of failure, execution time • Support reuse and repurposing • Separation of control and data aspects allows use of existing model checkers and familiar techniques • Process algebras, temporal logics, type polymorphisms, term graphs • Current version works on Discovery Net/InforSense • KNIME, Pentaho very similar – only require extra parsers • Full streaming process model for Taverna in the works