380 likes | 397 Views
This overview explores the various workflow tools in the EGEE Grid infrastructure, from low-level tools to complete environments and projects. It covers core tools, job management with workflow semantics, pure workflow tools, and integrated environments. The perspectives and future plans for these tools are also discussed.
E N D
Workflow Tools in the EGEE Grid Infrastructure Overview and Perspectives Vangelis Floros, GRNET/NCSR “Demokritos” efloros@cern.ch EGEE07, Budapest, Hungary 2007, October 1st
Contents • Introduction • Grid Workflows in context • Survey workflow tools from the low level, to more advanced tools, to complete environments & projects. • Core (gLite WMS) • Job Management supporting workflow semantics (GridWay, GANGA/DIANE) • Pure Workflow Tools (P-GRADE, MOTEUR, +Taverna) • Integrated environments/projects (K-Wf Grid, A-Ware, gCube) • Conclusions • Summary matrix • Perspectives Sources: developing teams tool/project web sites. EGEE’07, Budapest, HUNGARY 2007, October 1st
Acknowledgments Many thanks to: • Alessandro Maraschini, Francesco Giacomini, Simone Pellegrini (CNAF) - gLite WMS • Gergely Sipos, Péter Kacsuk (MTA-SZTAKI) - P-GRADE • Johan Montagnat T. Glatard, D. Lingrand (CNRS) - MOTEUR • Ladislav Hluchý, Viet Tran (SAS) - K-Wf Grid • Jakub Moscicki (CERN) - GANGA/DIANE • Nicola Venuti (NICE) – A-Ware • Monique Petitdidier (IPSL) and David Weissenbach (IPGP) Please email me for any errors or omissions you may spot. EGEE’07, Budapest, HUNGARY 2007, October 1st
Grid Workflows • Grid Workflows = Scientific workflows • Exploit one or more Grid e-Infrastructures. • Mostly use job as the execution unit abstraction • Different requirements per scientific discipline (can one tool satisfy all of them?) • Support for multiple levels of parallelization (within the applications, parallel tasks, concurrent parametric execution) • Data-oriented. Strengthened Data semantics vs Control flow semantics. • Interactivity • Facilitate rigorous research: Reproducibility and Verifiability • Monitoring (especially for long-running workflows) • Alleviate collaboration • Interest for NA4: Beneficial for large number of applications EGEE’07, Budapest, HUNGARY 2007, October 1st
gLite WMS • Developed by INFN, Datamat, CESNET • Exploits Condor’s DAGMan capabilities • JDL/Classads for workflow description • Accessed from gLite UI command line tools. • Underlying Grid Middleware used: Condor, Globus • Used by various small and large applications from various disciplines that require • Basic workflow support • Parametric job execution • But also for bypassing other restrictions (limited proxy lifetime) • Basis for high-level tools providing abstractions and user friendly functionality. EGEE’07, Budapest, HUNGARY 2007, October 1st
Father nodeF nodeG nodeD nodeA nodeE nodeB nodeC nodeH nodeI JDL Workflows • Implemented as Directed Acyclic Graphs (DAGs) • Set of jobs where the input, output or execution of one of more jobs may depend on one or more other jobs • Dependencies represent time constraints: a child cannot start before all parents have successfully completed EGEE’07, Budapest, HUNGARY 2007, October 1st
gLite WMS –Future plans • Possible “external integration” with existing Workflow frameworks • A proposal for a Workflow Mangement System Integrated within WMS under discussion • Running on top of gLite Middleware • Abstract and Generic Representation of Workflow • Internally usage of Petri Net model • Externally translation mechanisms from different language front ends • More info: http://egee-jra1-wm.mi.infn.it/egee-jra1-wm/ EGEE’07, Budapest, HUNGARY 2007, October 1st
GANGA GANGA/DIANE • GANGAis a user interface which simplifies the job submission and monitoring. It is a convenience layer based on Python, which hides all complexity of the underlying system. • DIANE (Distributed Analysis Environment) • A lightweight framework for parallel scientific applications in master-worker model • assuming that job can be split into a number of independent tasks • It takes care of all synchronization, communication and workflow management details on behalf of application • Developed by CERN/IT. EGEE’07, Budapest, HUNGARY 2007, October 1st
Success Stories Numerous applications have taken advantage of the framework: • The Avian Flu Drug Search is an example of a comprehensive bio-informatics system • includes web portals (based on ASGC products) and execution engines (based on Ganga/DIANE). • It is meant to be an environment supporting bio-informatics workflows. http://www.twgrid.org/Application/Bioinformatics/AvainFlu-GAP • Microscopic Image Alignment: The xmipp is an example of iterative application: • each iteration is a set of independent tasks which are executed in the M/W self-scheduling regime. • The output of the previous iteration is used to generate input for the iteration. • The processing is handled by DIANE master agent which controls the convergence criteria (stop condition) and synchronization tasks at the iteration boundary. http://indico.cern.ch/contributionDisplay.py?contribId=8&sessionId=24&confId=7247 EGEE’07, Budapest, HUNGARY 2007, October 1st
Summary • Not a pure Workflow solution but can be used as a basis for building engines and setting up ad-hoc workflows of applications. • RoboGanga: Ganga Robot is a tool for running a user-defined list of actions within the context of a Ganga session. GANGA: http://cern.ch/ganga DIANE: http://cern.ch/diane EGEE’07, Budapest, HUNGARY 2007, October 1st
GridWay • GridWayMetascheduler: enables large-scale, reliable and efficient sharing of computing resources (clusters, computing farms, servers, supercomputers...), managed by different LRM (Local Resource Management) systems, such as PBS, SGE, LSF, Condor..., within a single organization (enterprise grid) or scattered across several administrative domains. • Developed by the Distributed Systems Architecture Group from Universidad Complutense de Madrid. • GridWay joined the dev.globus incubation process in May 2006, being the first ever project to escalate to a full Globus project in January 2007. EGEE’07, Budapest, HUNGARY 2007, October 1st
A.Scheduling Capabilities A.1. Dynamic Scheduling A.2. Opportunistic Migration A.3. Performance Slowdown Detection A.4. Self-adaptive Applications A.5. Checkpointing Support A.6. Scheduling Policies A.7. Scheduling Reporting and Accounting A.8. Job Dependency A.9. Single and Multi-user Support B.Fault Detection & Recovery Capabilities B.1. Job Cancellation B.2 Remote System Crash or Outage B.3 Network Disconnection B.4. Client Fault Tolerance C.User Interface Functionality C.1. Broad Application Scope C.2. DRM Command Line Interface C.3. DRMAA Application Programming Interface D.Installation & Configuration Issues D.1. Modular Architecture D.2. Requirements on Core Grid Services D.3. Supported Remote Services D.4. Supported Client Platforms D.5. Decentralized Architecture D.6. Security D.7. Meta-scheduling Infrastructure Scenarios Functionality EGEE’07, Budapest, HUNGARY 2007, October 1st
Success stories • Life Sciences • Computational Proteomics at Centro de Astrobiología in Instituto Nacional de Técnica Aeroespacial, associated to the NASA Astrobiology Institute • Dynamic BLAST (Presentation at GlobusWorld06) at Collaborative Computing Lab in University of Alabama at Birmingham • Multi-Resolution Docking at Centro de Investigaciones Biológicas in Centro Superior de Investigaciones Científicas • CD-HIT at Centro Nacional de Investigaciones Oncológicas in Instituto de Salud Carlos III • Surface EMG Signal Simulation at Centro di Bioingegneria in Politecnico di Torino • Aerospace • XMM-Newton Data Processing at European Space Astronomy Centre in European Space Agency • Mars Impact Cratering Simulation at Centro de Astrobiología in Instituto Nacional de Técnica Aeroespacial, associated to the NASA Astrobiology Institute • Characterizing the Local Instability of Orbits at Nonlinear Dynamics and Chaos Group in Universidad Rey Juan Carlos • Fusion Physics • Massive Ray Tracing in Fusion Plasmas at Laboratorio Nacional de Fusión in Centro de Investigaciones Energéticas Medioambientales y Tecnológicas • Computational Chemistry • Chemical Reactor Design at Centre for Process and Information Systems Engineering in University of Surrey • GAMESS and GAUSSIAN at Grupo de Química Computacional y Computación de Alto Rendimiento in Universidad de Castilla-La Mancha • Generic • Genetic Algorithms EGEE’07, Budapest, HUNGARY 2007, October 1st
GridWay in EGEE • GridWay is being successfully used in the NA4 Fusion activities. • GridWay is being tested under ETICS, the eInfraestructure for Testing, Integration and Configuration of Software of the EGEE project. EGEE’07, Budapest, HUNGARY 2007, October 1st
GridWay: Conclusions • Same level of workflow support as with gLite WMS • Added-value functionality wrt job management capabilities. • Commitment to open standards • One of the NA4 Recommended applications (RESPECT program) • More information: http://www.gridway.org EGEE’07, Budapest, HUNGARY 2007, October 1st
P-GRADE • Developed by MTA SZTAKI • Built-in graphical editor to develop Grid workflows and workflow based parametric study applications • Built-in workflow subsystem based on Condor DAGMan • Executing application components on LCG-2, gLite, Globus 2, Globus 4 and ARC middleware based Grid systems in a user transparent way EGEE’07, Budapest, HUNGARY 2007, October 1st
Certificate and proxy management Grid and Grid resource management Graphical editor to define workflows and parametric studies Accessing resources in multiple VOs Built-in workflow manager and execution visualization GUI is customizable to certain applications P-GRADE portal in a nutshell EGEE’07, Budapest, HUNGARY 2007, October 1st
Elements of a P-GRADE Portal workflow • A directed acyclic graph where • Nodes represent jobs that can be • A batch program submitted from the client side to a Computing Element • A batch program installed on a Computing Element or stored on the Portal server • Ports represent input/output files the jobs require or produce • Arcs represent file transfer operations and dependencies among jobs • semantics of the workflow: • A job can be executed if all of its input files are available EGEE’07, Budapest, HUNGARY 2007, October 1st
Parallel layers of P-GRADE Portal applications Multiple instances of the same workflow process different data files • Parallel execution inside a workflow node (MPI job as workflow component) • Parallel execution among workflow nodes(different jobs on different worker nodes) • Parameter study execution of the workflow (Single instruction Multiple Data) Multiple jobs run parallel Each job can be a parallel program EGEE’07, Budapest, HUNGARY 2007, October 1st
Applications based on P-GRADE • Ultra short range weather forecast (MEANDER): workflow that integrates 4 meteorological algorithms and one visualizer component • Road traffic simulation: predict the density of cars on the roads of Manchester. Workflow that integrates 4 simulator components • Minimizing operational cost of factories and logistic service providers (EMMIL): Parameterized workflow resulting thousands of short running jobs • Molecular Dynamics Study of Water Penetration (CHARMM): Parameterized workflow resulting hundreds of long running jobs • Studying oscillons and magnetic monopole configurations: Parameterized workflow resulting hundreds of short running jobs EGEE’07, Budapest, HUNGARY 2007, October 1st
Present and future • Benefits • Short learning curve Swift uptake of grid technology • Graphical access Protection against cmd and API changes • High level, abstract tools easy to perform complex operations(e.g. file transfer + LFC update) • Lessons learnt • Workflows need parametric study support • Portals must be easily customizable for applications • P-GRADE must be open source • Workflows need loops and if-then-else structures • Job failure rate is sometimes high – failure management layer • More information • P-GRADE Grid Portal and Developer Alliance session Wednesday 11:00 – 12:30 • P-GRADE Portal Alliance boothDemo sessions • portal.p-grade.hu ü ü ü Next release Future work EGEE’07, Budapest, HUNGARY 2007, October 1st
MOTEUR worfklow manager • Open source workflow enactor • History • Developped at the I3S CNRS laboratory • With the support of French national projects • AGIR: http://www.agir.org • GWENDIA: http://gwendia.polytech.unice.fr • Targets • Ease of use, flexibility, service-oriented approach • Performance, transparent exploitation of application parallelism • Supports • Scufl language (from myGrid/Taverna): pure dat a flow approach • Service based invocation (WS), see also Taverna/Triana/Kepler EGEE’07, Budapest, HUNGARY 2007, October 1st
Service Service AB A B A B Pure data flow • Dynamic data flows • Independent description of processings and data • Dynamic data sets (variable data sets size, controls conditioned by data availability...) • Loops are possible (as opposed to DAGs approach) • Data composition patterns for rich data workflows semantics A1, A2, A3 B1, B2, B3 A1, A2, A3 B1, B2, B3 One-to-one All-to-all AB A B A B 1 2 3 1 2 3 1 2 3 1 2 3 EGEE’07, Budapest, HUNGARY 2007, October 1st
Application: Bronze standard ~100 image pairs ~800 EGEE jobs A B Params CrestLines Params Params Params Params Service PFMatchICP Params GetFromEGEE Yasmina Baladin PFRegister Params FormatConv GetFromEGEE GetFromEGEE GetFromEGEE WriteResults FormatConv FormatConv FormatConv WriteResults WriteResults WriteResults MethodToTest MultiTransfoTest Accuracy Translation Accuracy Rotation
Summary • Service-oriented applications • Flexibility, clear interfaces, independence, composition... • Generic service wrapper: enable legacy code enactment • MOTEUR workflow manager • Data intensive, data composition patterns • Service-based (Web Services, GridRPC) • Handle complex data types • Multi-grid (EGEE, Grid5000) • Rich data composition semantics • Support for data provenance • Research prototype... More information: http://egee1.unice.fr/MOTEUR (GPL-like licensed code, documentation, tutorials) EGEE’07, Budapest, HUNGARY 2007, October 1st
K-Wf Grid Project Goals • To enable users to create complex workflows and use grid resources without detailed knowledge of grid • To construct workflows optimized for underlying infrastructure, using its advantages and avoiding its bottlenecks • To (semi-)automatically construct workflows based on user’s requirements, using semantic annotation of services, data, applications and resources • To constantly renew information about the grid by using complex monitoring network – to learn from experience • To provide simple, easy-to-use interface to K-Wf Grid services K-WF Grid (Contract No. 511385) is a project funded by the European Commission under the 6th Framework Programme. Its duration is 30 months and started on 1 September 2004 EGEE’07, Budapest, HUNGARY 2007, October 1st
K-Wf Grid: Features • Composition of workflow from a set of services • System composes the workflow for you – just tell him what you want to get at the end • System uses services which are available at the time and which are expected (based on past experience) to provide good results (good = what you want) • Usability • Less grid language, more application domain language • Integrated collaboration interfaces • Reuse of components • K-Wf Grid is based on respected standards GWorfklowDL language based on Petri net formalization EGEE’07, Budapest, HUNGARY 2007, October 1st
Architecture EGEE’07, Budapest, HUNGARY 2007, October 1st
Flood Forecasting Simulation Cascade Chain of simulations, targeting hydraulic simulation of a flooded area Stateful WSRF services implementation in Java, using Globus Toolkit 4 Enterprise Resource Planning Stateless web services in Java (Tomcat/Axis) Coordinated Traffic Management Traffic simulation in Genoa, Italy Developed as stateless web services in Perl K-Wf Grid: Pilot applications EGEE’07, Budapest, HUNGARY 2007, October 1st
K-Wf Grid - Epilogue • K-Wf Grid is one of the early implementations of SOKU (Service Oriented Knowledge Utilities) concepts • How to adapt infrastructure research to this shift in paradigm? • Application developers & end users need easy access to grid infrastructure • SOKU is the way to achieve this • How to extend gLite towards SOKU? • More information: http://www.kwfgrid.eu EGEE’07, Budapest, HUNGARY 2007, October 1st
The A-WARE project will develop a stable, supported, commercially exploitable, high quality technology named A-WARE able to give easy access to Grid resources. Based on Engineframe, Unicore/GS Workflow management of Grid atomic services invocations Workflow orchestration framework a Web-based workflow design application a repository to store workflows and associated metadata Unicore TSS service for gLite A-WARE Project Start date: June 1st 2006 Duration: 2 years EGEE’07, Budapest, HUNGARY 2007, October 1st
Workflow designer application (WDA) Workflow repository Service (WRS) Workflow orchestrator service (WOS) Workflow Valitador / Modeler Service (WVMS) A-WARE Service Bus (ASB) UAS interface UAS interface Architecture JSR 168 Gridsphere EnginFrame Portal Other clients (open source) BPMN (BPEL) A-WARE technology components JBI (BPEL) WSRF OGSA Base Profile Non OGSA middlewares Unicore/GS GTK4 gLite Other OGSA interface Other LSF Os/Storage
A-Ware: What’s Next • Now: • first public prototype • Next: • support for more complex BPMN/BPEL workflows • improve both UIs and ASB mediation-tier • improve monitoring and user interaction capabilities • support for gLite middleware • extend Workflow support to other languages/engines (Scufl, XPDL?) More information: http://www.a-ware-project.eu EGEE’07, Budapest, HUNGARY 2007, October 1st
Developed in the context of the DILIGENT project. Follows the SOA paradigm based on WSRF Implements a rich featured Process Management Subsytem The Process Design & Verification Service provides the user with graphical tools for the design and manipulation of process definitions. Verifies that a process obeys certain rules, so that it can be considered safe for execution. The Process Execution & Reliability Service responsible for all actions pertaining to the actual execution of compound services, i.e. for finding and allocating resources, starting, monitoring or aborting processes handling process execution failures in accordance with transactional policies/guarantees. Process Optimisation Service, Optimises user and system processes are being prior to their execution by the actual Supports a rich, flexible set of cost-estimation policies gCube PMS EGEE’07, Budapest, HUNGARY 2007, October 1st
OSIRIS engine • Distributed process execution engine developed by UMit and now UniBasel. • Uses internal language for describing processes. • Adopted BPEL 1.0 in the context of DILIGENT and extended to support optimisation features. • Mainly used for the orchestration of search queries on data collections • Provides wrapper services to gLite WMS. Can run arbitrary applications that process DL data. More info: http://www.gcube-system.org EGEE’07, Budapest, HUNGARY 2007, October 1st
Comparison Matrix EGEE’07, Budapest, HUNGARY 2007, October 1st
Conclusions • Workflows are becoming ubiquitous • Fundamental for non trivial applications -> Boost scientific work • Different tools for different needs • Workflow environments offer abstractions for Grid infrastructures hiding their complexity • The survey was not exhaustive. • Surely there are many more other solutions out there • Call for the applications -> tell us what you use (Triana, Kepler, etc). EGEE’07, Budapest, HUNGARY 2007, October 1st
Applications How crucial are they for you? Wish list of missing functionality Functionality that need to be reworked? User friendliness Fault tolerance Data management More advanced workflow semantics? SOKU? Developing Teams Towards Service-Oriented Knowledge Utilities (SOKU) Service oriented implementations Higher level Application and Data Semantics Ontology-based Towards collaborative Workflow environments Beyond orchestration (centralized execution) Grid Workflows / Business Workflows bridging. Common Standards (BPEL) Data-centric business workflows Bridging the two communities Can there be an one-size fits all solution? EGEE’07, Budapest, HUNGARY 2007, October 1st