110 likes | 302 Views
LQCD Workflow Management. L. Piccoli August 30, 2006. Grid Workflow Management System Overview. Source: A taxonomy of Workflow Management Systems for Grid Computing (J. Yu and R. Buyya). Work items. Campaign Specification Workflow description User interface Data movement
E N D
LQCD Workflow Management L. Piccoli August 30, 2006
Grid Workflow Management System Overview Source: A taxonomy of Workflow Management Systems for Grid Computing (J. Yu and R. Buyya)
Work items • Campaign Specification • Workflow description • User interface • Data movement • Configuration and output files from central repository (dCache?) • Management of intermediate files • Transfer between worker nodes • Catalog of intermediate files • Provenance • Scheduling • Optimal schedule of multiple campaigns • Balance between file transfers and computation • Priority • Preemption
Work items (2) • Monitoring • Feedback to scheduling system • Provide current status to users • Enactment • Job dispatching • Interface with campaign scheduler
Work Plan • Requirements (Y1Q1) • Guidance from the Use Cases document by Jim • Output: • Complete view of the system • Identification of major components and functionalities • Interface between subsystems
Work Plan (2) • Evaluation of existing systems (Y1Q2) • Most are targeted for Grid and Web Services • What features should we look for? • Extensible: can we use our own tools in the place of the Grid middleware • Use graphical interface for specifying and monitoring the workflow • Specification language and tools • Current systems are designed for a single workflow • Some candidates • Triana, Taverna and YAWL (Yet Another Workflow Language)
Work Plan (3) • Integrate selected system with current environment (Y1Q3) • Workflow specification • Submit to scheduling and execution using current tools (Maui/PBS) • Development of first version of campaign scheduling system (Y1Q4) • Multi-campaign support hooks
Work Plan (4) • Enactment (Y2Q1) • PBS replacement • Integration with health monitoring system (Y2Q2) • Implementation of rescheduling feature • Add advanced features to scheduling system (Y1Q3) • File pre-fetching • Quality of Service
Schedule • Requirements (Y1Q1) • Evaluation of Existing Systems (Y1Q2) • Workflow Specification (Y1Q3) - v1_0 • Simple Scheduler (Y1Q4) - v2_0 • Improved Enactment System (Y2Q1) - v3_0 • Integration with Monitoring (Y2Q2) - v4_0 • Advanced Scheduling Features (Y2Q3/4) - v5_0
Workplan (original) • Gather requirements and describe common usage scenarios for the workflow management system based on experiences with the current LQCD software structure. (Y1Q1) • Develop an overview of the whole system, identifying high-level modules within the scope of this project. Delineate the interfaces between the internal modules as well as modules to be developed outside our scope. (Y1Q1) • Define a workflow specification language to describe the dependencies of an LQCD analysis campaign. The language can be defined as a XML schema and existing parsing tools (e.g. JDOM and dom4j) could be used for validating the workflow specifications. Currently GHS uses XML files to describe the information of resources, files and tasks. (Y1Q2) • Evaluate integration of our work with other workflow systems such as DAGMan, Pegasus, and Karajan and modeling environments, such as Vanderbilt’s GME tool (http://www.isis.vanderbilt.edu/Projects/gme/), to develop a unified workflow specification and visualization management system. (Y1Q2) • Integrate the workflow system with the existing LQCD computing infrastructure, especially the resource management system. Users should be able to describe the campaign workflow through XML files or graphical interfaces and submit the campaign for execution. (Y1Q3) • Deploy a scheduling system capable of interacting with the system performance monitor and workflow system. The scheduling system receives instructions from the workflow system and interacts with the execution system. Upon a fault or abnormal behavior detection the system reschedules tasks. (Y1Q4) • Integrate workflow system with the LQCD system health monitor (developed by VU). (Y2Q1) • Conduct lifetime management of temporary and intermediate data in order to maximize resource utilization (e.g. disk space, network bandwidth, memory, CPU), and adopt a workflow-based data prefetching to overlap computing and communication to improve the computing efficiency. (Y2Q2) • Extend the schedule system to manage multi-campaign executions based on constraints specified in the campaign definitions, such as dead lines and availability of input files. (Y2Q3) • Test the workflow system and carry modification/enhancement necessary to ensure its correctness, effectiveness, and feasibility. (Y2Q4)