1 / 22

Scientific Workflows in e-Science

Scientific Workflows in e-Science. Dr Zhiming Zhao ( zhiming@science.uva.nl ) System and Network Engineering, University of Amsterdam Virtual Laboratory for e-Science. Outline. Background Scientific workflow management system Virtual Laboratory for e-Science Our approach

elan
Download Presentation

Scientific Workflows in e-Science

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scientific Workflows in e-Science Dr Zhiming Zhao (zhiming@science.uva.nl) System and Network Engineering, University of Amsterdam Virtual Laboratory for e-Science

  2. Outline • Background • Scientific workflow management system • Virtual Laboratory for e-Science • Our approach • Challenges and research lines • Activities

  3. Data analysis Define problems Experiments Discovery Problem solving: a typical scenario in scientific research • Analysis • Hypothesis • Related work • Propose experiments • Define steps • Prototype computing systems • Perform experiments • Data collection • Presentation • Dissemination • Visualization • Validation • Adjust experiment • Refine hypothesis • Activities are: • Iterative, dynamic, and human centered • Requires different levels of resources

  4. Example scenarios • In problem analysis • Identify domains, search key problems, find typical methods, and review related work • In scientific experiments: scientific computing & data processing • Define dependencies between computing and data processing tasks, and schedule their runtime behavior • In data analysis • Visualization, compare the results of different parameters, keep meaningful configuration and continue experiments • Search related work, compare results • In dissemination • Documenting experiments, present results, citation, publication

  5. Distributed data sharing & dissemination Distributed resources Distributed Parallel computing Visualization, Remote resource invocation Computer support for problem solving • Problem Solving Environment: (E Gallopoulos et. al., IEEE CS Eng. 1994) • Organize different software components/ tools • Allows a user to assemble these tools at a high level of abstraction • Control runtime behavior of experiments • Examples: MATLab, Ptolemy, etc. Scientific workflow management systems: A new guise of PSE! Traditional PSE: organize and execute resources locally!

  6. Inside a Scientific Workflow Management System In our view, a SWMS at least implements: • A model for describing workflows; • An engine for executing/managing workflows; • Different levels of support for a user to compose, execute and control a workflow. Workflow (based on certain model) Composition A SWMS User support Engine level control Engine Resource level control resources

  7. Scientific Workflows in e-Science Experiment processes Workflows for administration, e.g., AAA, and other issues. Workflows varies at different • Phases of experiments: design, runtime control, dissemination; • Abstractions of resources: concrete and abstract; • Levels of activity details: computing, data access, search/matching, human activities; • … Abstract workflows Executable (concrete workflows)

  8. Diversity in SWMS • Taverna: • Web services based language: Scufl; • FreeFluo: engine • Graphical viz of workflow • Triana: • Components • Task graph • Data/control flow • Kepler: • Actor,director • MoML • Execution models • Pegasus: • Based on DAGMan • VDL • DAG … • DAGMan: • Computing tasks • DAG

  9. Food Informatics Dutch telescience Medicaldiagnosis Bio diversity Virtual Laboratory for e-Science Data intensivescience Bioinformatics ASP Application layer Generic e-science framework layer Grid layer

  10. Mission Effectively reuse existing workflow managements systems, and provide a generic e-Science framework for different application domains. A generic framework can • Improve the reuse of workflow components and the workflows for different experiments • Reduce the learning cost for different systems • Allow application users to work on a consistent environment when underlying infrastructure changed

  11. Previous work: VLAM-G environment • VLAM-G • A Grid enable PSE • Data intensive applications • Visual interface • Two levels of workflow support • Human interaction support

  12. Workflow in VLAMG

  13. VLAM-G PFT/Study • Process-Flow Template • Graphical representation of data elements and processing steps in an experimental procedure. • Study • Descriptions of experimental steps represented as an instance of a PFT with references to experiment topologies. Experiment Topology • Graphical representation of self-contained data processingmodules attached to each otherin a workflow.

  14. Lessons learned • How to introduce a new PSE to a domain scientist? • Because it has a beautiful architecture? • Or because it can allow a scientist to keep their current work style? • How to use existing work? • Scientists need one system or more options? • How to include user in the computing loop? • Dynamic workflows and human in the loop computing are important. Z. Zhao et al., “Scientific workflow management: between generality and applicability”, QSIC 2005, Australia

  15. Workflow support in VL-e • Recommend suitable workflow systems for different application domains: • Analyze typical application use cases • Define small projects with different application domains • Review existing workflow systems • Recommend four workflow systems: Triana, Taverna, Kepler, and VLAMG • A long term • Extend VLAMG and develop our own generic workflow framework

  16. A workflow bus paradigm Workflow Sub workflow 1 Sub workflow 2 Sub workflow 3 Triana Taverna Kepler Workflow bus A workflow bus is a special workflow system for executing meta workflows, in which sub workflows will be executed by different engines. Z. Zhao et al., “Workflow bus for e-Science”, in IEEE Int’l Conf. e-Science 2006, Amsterdam

  17. Applications of workflow bus • Use case 1: • A user has workflow in Taverna • Some functionality is missing in Taverna but can be provided by Triana • He can develop the workflow in two systems, and run it via the workflow bus • Use case 2: • A user wants to execute a Taverna or Triana workflow in multiple instances with different input data

  18. Ongoing research • Web service in data intensive applications • Execution models for Grid workflows • Including PSE in scientific workflows • Industrial standards in scientific workflows

  19. Relevance between our research and Elsevier’s work • In a same context from the scale of entire lifecycle of e-Science experiments • Different focuses • We focus on runtime behavior of scientific experiments, e.g., Grid computing, data/computing intensive applications, and scheduling of computing tasks • Elsevier highlights data search and integration on well structured data bases, research preparation, and literature search and management

  20. Cont. • Different characteristics in workflows • In our workflows, processing and managing runtime dynamic data is the key patterns • In Elsevier workflows, storage, replicate, access, match and integrate static data might be more common • Facing similar challenges: • Semantics based data search and integration • Workflow provenance • Collaborative interaction (workflow development, resource sharing, knowledge transfer) • Modeling user profiles

  21. Activities • Int’l workshop on “Workflow systems in e-Science”, organized by Zhiming Zhao and Adam Belloum, in the context of ICCS06, Reading University, May 28, 2006. • Proceedings is in LNCS, Springer Verlag. • A special issue will be published in Scientific Programming Journal. • http://staff.science.uva.nl/~zhiming/iccs-wses • Workshop on “Scientific workflows and industrial workflow standards in e-Science”, organized by Adam Belloum and Zhiming Zhao, in the context of IEEE e-Science and Grid computing conference in Amsterdam December 2006. • Pegasus, Dr. Ewa Deelman (Department of Computer Science University of South California) • BPEL, Dr. Dieter König (IBM Research Germany Development Laboratory) • Kepler, Dr. Bertram Ludäscher (Department of Computer Science University of California, Davis) • Taverna, Prof. Peter Rice (European Bioinformatics Institute) • WS and Semantic issues, Dr. Steve Ross-Talbot (CEO, and a co-founder, of Pi4 Technologies) • Triana, Dr. Ian J. Taylor (Department of Computer Science Cardiff University) • http://staff.science.uva.nl/~adam/workshop/VL-e-workshop.htm

  22. References • Virtual Laboratory for e-Science: www.vl-e.nl • Network and System Engineering, Faculty of Science, University of Amsterdam: http://www.science.uva.nl/research/sne/ • Z. Zhao; A. Belloum; H. Yakali; P.M.A. Sloot and L.O. Hertzberger: Dynamic Workflow in a Grid Enabled Problem Solving Environment, in Proceedings of the 5th International Conference on Computer and Information Technology (CIT2005), pp. 339-345 . IEEE Computer Society Press, Shanghai, China, September 2005. • Z. Zhao; A. Belloum; A. Wibisono; F. Terpstra; P.T. de Boer; P.M.A. Sloot and L.O. Hertzberger: Scientific workflow management: between generality and applicability, in Proceedings of the International Workshop on Grid and Peer-to-Peer based Workflows in conjunction with the 5th International Conference on Quality Software, pp. 357-364. IEEE Computer Society Press, Melbourne, Australia , September 19th-21st 2005. • Z. Zhao; A. Belloum; P.M.A. Sloot and L.O. Hertzberger: Agent technology and scientific workflow management in an e-Science environment, in Proceedings of the 17th IEEE International conference on Tools with Artificial Intelligence (ICTAI05), pp. 19-23. IEEE Computer Society Press, Hongkong, China, November 14th-16th 2005.

More Related