280 likes | 419 Views
Workflow design and implementation issues in the VL-e project. Pieter Adriaans, Adam Belloum, Zhiming Zhao, Frank Terpstra, Scott Marshall, Sophia Katrenko, Willem van Hage, Edgar Mey, Machiel Jansen, Jan Top, Diego Faneyte, Nicole Koenderink University of Amsterdam. Outline. Background
E N D
Workflow design and implementation issues in the VL-e project Pieter Adriaans, Adam Belloum, Zhiming Zhao, Frank Terpstra, Scott Marshall, Sophia Katrenko, Willem van Hage, Edgar Mey, Machiel Jansen, Jan Top, Diego Faneyte, Nicole Koenderink University of Amsterdam
Outline • Background • Virtual Laboratory for e-Science • Our approach: Use cases, Generic template workflows, heuristic tools • The Workflow design problem • Challenges and research lines • Activities
Virtual Lab Application specific service Medical Application Telescience Bio ASP Application Potential Generic service & Virtual Lab. services Virtual Lab. rapid prototyping (interactive simulation) Virtual Laboratory Additional Grid Services (OGSA services) Grid Middleware Grid & Network Services Network Service (lambda networking) Surfnet VL-E Proof of concept Environment VL-E Experimental Environment
Food Informatics SP.1.2 Telescience SP.1.6 Bio-diversity SP.1.4 Data intensive science SP.1.1 Medical diagnosis & imaging SP.1.3 Bio-Inofrmatics SP.1.5 SP2.2 SP2.4 SP2.3 Adaptive information Disclosing Collaborative information Management SP2.5 SP2.1 Interactive PSE User Interface & Virtual reality Virtual lab. & System integration HPDC & Processor Data co-allocation Security& Generic AAA Optical Networking
Mission Effectively reuse existing workflow managements systems, and provide a generic e-Science framework for different application domains. A generic framework can • Improve the reuse of workflow components and workflows in different experiments • Reduce the learning cost needed for learning different systems • Allow users to work on a consistent environment when underlying infrastructure changed
Two phase approach • Recommend suitable workflow systems for different application domains: • Analyze typical application use cases • Define small projects with different application domains • Review existing workflow systems • Recommend four workflow systems: Triana, Taverna, Kepler, and VLAMG • A long term • Extend VLAMG and develop our own generic workflow framework Recommendation report: scientific workflow management in PoC R1 VL-e internal report, Oct 17, 2005.
1700 Comparisons 3500 Comparisons
The Workflow design problem I • A workflow is an inherent part of the problem solving heuristics • Induction of optimal workflows is an important research issue • Manipulating workflows is an important aspect of E-science
Functional analysis of Solution 2 Combinatorial Matching Preprocessing Select Edge pieces Select pieces On color Combinatorial Matching Combinatorial Matching Select pieces On shape
Preprocessing Selection On feature Edgepiece Selection On feature Color Selection On feature Shape Combinatorial Matching Combinatorial Matching Combinatorial Matching Generic Parameterized Services Template Workflow Application: Solving Jigsaw puzzle Preprocessing Selection On feature X Combinatorial Matching
The Workflow design problem II • Design of generic services + • Template applications • Granularity • Generality • Parameter structure • Readability: Communication with end users
Design spaces: Terpstra & Adriaans SPSD SPMD MPSD MPMD
a b a b c 0 1 2 3 4 5 c 8 c 9 Design space • All different workflows represented in this Lattice structure are computationally equivalent. • Design issues are the only reason to select a certain solution • A domain for the task of learning Workflows • NFA/DFA induction (HMM’s): EDSM • Temporal learning: nominal event sequences (Anthunes, 2005) • Timed Automata (Verwer 2006)
WCFS Case: Typical questions • Provide all registrations on perception of bitterness for panel members with age over 30 years who evaluated coffee. • Is there a relationship between custard and nitrogen? • What factors influence the perception of creaminess of custard?
In Sample Lit Lab. Exp Literature Out Data Report In Data Analysis Out Data Basic approach Situation Problem Research question Answer / conclusion
Research Question Project Activity LabExperiment Computational Experiment Sample Method Data Set Device Parameter Set Panel Hardware Panel Member Domain ontology (concise)
Adaptive Information Disclosure: Generic Template Workflow Formulate query Fire query Search Construct answer Display results User support: Alternatives Disambiguation Query Expansion Filtering Relevance- score Link to Concept tree Data Selection Preprocessing Named Entity Recognition Relation Recognition • Advanced • Constraint • Recognition Validation Version Manage- Ment Ontology Domain selection Ontology Learning Information Retrieval
Distributed data sharing & dissemination Distributed resources Distributed Parallel computing Visualization, Remote resource invocation Computer support for problem solving • Problem Solving Environment: (E Gallopoulos et. al., IEEE CS Eng. 1994) • Organize different software components/ tools • Allows a user to assemble these tools at a high level of abstraction • Control runtime behavior of experiments • Examples: MATLab, Ptolemy, etc. Scientific Workflow Management: organize and execute on grid enabled resources! Traditional PSE: organize and execute resources locally!
A workflow bus paradigm Workflow bus Z. Zhao et al., “Workflow bus for e-Science”, to appear IEEE e-Science 2006, Amsterdam
Lessons learned from phase 1 • In the scientific community there are two types of workflow users: the end-users, the application developers. • The two categories of users have completely different requirements: easy-to-use, easy-for-developing new applications, and easy-for-migrating legacy applications • How to introduce a new WMS to a domain scientist? • Because it has a well defined architecture? • Or because it can allow him to keep their current work style? • How to reuse existing work? • Support multiple WMS systems or add more options to one WMS? • How to efficiently include user in the computing loop? Z. Zhao et al., “Scientific workflow management: between generality and applicability”, QSIC 2005, Australia
Conclusions • A workflow is an inherent part of the problem solving heuristics • Induction of optimal workflows is an important research issue • Manipulating workflows is an important aspect of E-science • Specific problems of toolbox approach: • Design of generic services • Template applications
Research scope and lines • Focus 1: Interoperability and integration between workflow systems • Focus 2: Composition of meta workflows • Focus 3: Provenance at meta workflows • Focus 4: Enactment and orchestration of meta workflows • Focus 5: Human in the loop computing in meta workflows Z. Zhao, A. Belloum, M. Bubark: A research plan of VL-e SP2.5 V0.2 September 9, 1006