230 likes | 379 Views
Scientific Workflows in e-Science. Dr Zhiming Zhao ( zhiming@science.uva.nl ) System and Network Engineering, University of Amsterdam Virtual Laboratory for e-Science. Outline. Background Scientific workflow management system Virtual Laboratory for e-Science Our approach
E N D
Scientific Workflows in e-Science Dr Zhiming Zhao (zhiming@science.uva.nl) System and Network Engineering, University of Amsterdam Virtual Laboratory for e-Science
Outline • Background • Scientific workflow management system • Virtual Laboratory for e-Science • Our approach • Challenges and research lines • Activities
Data analysis Define problems Experiments Discovery Problem solving: a typical scenario in scientific research • Analysis • Hypothesis • Related work • Propose experiments • Define steps • Prototype computing systems • Perform experiments • Data collection • Presentation • Dissemination • Visualization • Validation • Adjust experiment • Refine hypothesis • Activities are: • Iterative, dynamic, and human centered • Requires different levels of resources
Example scenarios • In problem analysis • Identify domains, search key problems, find typical methods, and review related work • In scientific experiments: scientific computing & data processing • Define dependencies between computing and data processing tasks, and schedule their runtime behavior • In data analysis • Visualization, compare the results of different parameters, keep meaningful configuration and continue experiments • Search related work, compare results • In dissemination • Documenting experiments, present results, citation, publication
Distributed data sharing & dissemination Distributed resources Distributed Parallel computing Visualization, Remote resource invocation Computer support for problem solving • Problem Solving Environment: (E Gallopoulos et. al., IEEE CS Eng. 1994) • Organize different software components/ tools • Allows a user to assemble these tools at a high level of abstraction • Control runtime behavior of experiments • Examples: MATLab, Ptolemy, etc. Scientific workflow management systems: A new guise of PSE! Traditional PSE: organize and execute resources locally!
Inside a Scientific Workflow Management System In our view, a SWMS at least implements: • A model for describing workflows; • An engine for executing/managing workflows; • Different levels of support for a user to compose, execute and control a workflow. Workflow (based on certain model) Composition A SWMS User support Engine level control Engine Resource level control resources
Scientific Workflows in e-Science Experiment processes Workflows for administration, e.g., AAA, and other issues. Workflows varies at different • Phases of experiments: design, runtime control, dissemination; • Abstractions of resources: concrete and abstract; • Levels of activity details: computing, data access, search/matching, human activities; • … Abstract workflows Executable (concrete workflows)
Diversity in SWMS • Taverna: • Web services based language: Scufl; • FreeFluo: engine • Graphical viz of workflow • Triana: • Components • Task graph • Data/control flow • Kepler: • Actor,director • MoML • Execution models • Pegasus: • Based on DAGMan • VDL • DAG … • DAGMan: • Computing tasks • DAG
Food Informatics Dutch telescience Medicaldiagnosis Bio diversity Virtual Laboratory for e-Science Data intensivescience Bioinformatics ASP Application layer Generic e-science framework layer Grid layer
Mission Effectively reuse existing workflow managements systems, and provide a generic e-Science framework for different application domains. A generic framework can • Improve the reuse of workflow components and the workflows for different experiments • Reduce the learning cost for different systems • Allow application users to work on a consistent environment when underlying infrastructure changed
Previous work: VLAM-G environment • VLAM-G • A Grid enable PSE • Data intensive applications • Visual interface • Two levels of workflow support • Human interaction support
VLAM-G PFT/Study • Process-Flow Template • Graphical representation of data elements and processing steps in an experimental procedure. • Study • Descriptions of experimental steps represented as an instance of a PFT with references to experiment topologies. Experiment Topology • Graphical representation of self-contained data processingmodules attached to each otherin a workflow.
Lessons learned • How to introduce a new PSE to a domain scientist? • Because it has a beautiful architecture? • Or because it can allow a scientist to keep their current work style? • How to use existing work? • Scientists need one system or more options? • How to include user in the computing loop? • Dynamic workflows and human in the loop computing are important. Z. Zhao et al., “Scientific workflow management: between generality and applicability”, QSIC 2005, Australia
Workflow support in VL-e • Recommend suitable workflow systems for different application domains: • Analyze typical application use cases • Define small projects with different application domains • Review existing workflow systems • Recommend four workflow systems: Triana, Taverna, Kepler, and VLAMG • A long term • Extend VLAMG and develop our own generic workflow framework
A workflow bus paradigm Workflow Sub workflow 1 Sub workflow 2 Sub workflow 3 Triana Taverna Kepler Workflow bus A workflow bus is a special workflow system for executing meta workflows, in which sub workflows will be executed by different engines. Z. Zhao et al., “Workflow bus for e-Science”, in IEEE Int’l Conf. e-Science 2006, Amsterdam
Applications of workflow bus • Use case 1: • A user has workflow in Taverna • Some functionality is missing in Taverna but can be provided by Triana • He can develop the workflow in two systems, and run it via the workflow bus • Use case 2: • A user wants to execute a Taverna or Triana workflow in multiple instances with different input data
Ongoing research • Web service in data intensive applications • Execution models for Grid workflows • Including PSE in scientific workflows • Industrial standards in scientific workflows
Relevance between our research and Elsevier’s work • In a same context from the scale of entire lifecycle of e-Science experiments • Different focuses • We focus on runtime behavior of scientific experiments, e.g., Grid computing, data/computing intensive applications, and scheduling of computing tasks • Elsevier highlights data search and integration on well structured data bases, research preparation, and literature search and management
Cont. • Different characteristics in workflows • In our workflows, processing and managing runtime dynamic data is the key patterns • In Elsevier workflows, storage, replicate, access, match and integrate static data might be more common • Facing similar challenges: • Semantics based data search and integration • Workflow provenance • Collaborative interaction (workflow development, resource sharing, knowledge transfer) • Modeling user profiles
Activities • Int’l workshop on “Workflow systems in e-Science”, organized by Zhiming Zhao and Adam Belloum, in the context of ICCS06, Reading University, May 28, 2006. • Proceedings is in LNCS, Springer Verlag. • A special issue will be published in Scientific Programming Journal. • http://staff.science.uva.nl/~zhiming/iccs-wses • Workshop on “Scientific workflows and industrial workflow standards in e-Science”, organized by Adam Belloum and Zhiming Zhao, in the context of IEEE e-Science and Grid computing conference in Amsterdam December 2006. • Pegasus, Dr. Ewa Deelman (Department of Computer Science University of South California) • BPEL, Dr. Dieter König (IBM Research Germany Development Laboratory) • Kepler, Dr. Bertram Ludäscher (Department of Computer Science University of California, Davis) • Taverna, Prof. Peter Rice (European Bioinformatics Institute) • WS and Semantic issues, Dr. Steve Ross-Talbot (CEO, and a co-founder, of Pi4 Technologies) • Triana, Dr. Ian J. Taylor (Department of Computer Science Cardiff University) • http://staff.science.uva.nl/~adam/workshop/VL-e-workshop.htm
References • Virtual Laboratory for e-Science: www.vl-e.nl • Network and System Engineering, Faculty of Science, University of Amsterdam: http://www.science.uva.nl/research/sne/ • Z. Zhao; A. Belloum; H. Yakali; P.M.A. Sloot and L.O. Hertzberger: Dynamic Workflow in a Grid Enabled Problem Solving Environment, in Proceedings of the 5th International Conference on Computer and Information Technology (CIT2005), pp. 339-345 . IEEE Computer Society Press, Shanghai, China, September 2005. • Z. Zhao; A. Belloum; A. Wibisono; F. Terpstra; P.T. de Boer; P.M.A. Sloot and L.O. Hertzberger: Scientific workflow management: between generality and applicability, in Proceedings of the International Workshop on Grid and Peer-to-Peer based Workflows in conjunction with the 5th International Conference on Quality Software, pp. 357-364. IEEE Computer Society Press, Melbourne, Australia , September 19th-21st 2005. • Z. Zhao; A. Belloum; P.M.A. Sloot and L.O. Hertzberger: Agent technology and scientific workflow management in an e-Science environment, in Proceedings of the 17th IEEE International conference on Tools with Artificial Intelligence (ICTAI05), pp. 19-23. IEEE Computer Society Press, Hongkong, China, November 14th-16th 2005.