1 / 15

A Task Pipelining Framework for e-Science Workflow Management Systems

A Task Pipelining Framework for e-Science Workflow Management Systems. Hyeong S. Kim (hskim@dcslab.snu.ac.kr) In Soon Cho (ischo@dcslab.snu.ac.kr) Heon. Y. Yeom (yeom@snu.ac.kr) Dept. of Computer Science and Engineering Seoul National University. Outline. Introduction Motivation HVEM Grid

ronny
Download Presentation

A Task Pipelining Framework for e-Science Workflow Management Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Task Pipelining Framework for e-Science Workflow Management Systems Hyeong S. Kim (hskim@dcslab.snu.ac.kr) In Soon Cho (ischo@dcslab.snu.ac.kr) Heon. Y. Yeom (yeom@snu.ac.kr) Dept. of Computer Science and Engineering Seoul National University DCSLab, SNU

  2. Outline • Introduction • Motivation • HVEM Grid • Proposing System • PIPE File System • Conclusion DCSLab, SNU

  3. Introduction • Complex Scientific Workflow • Input/output data are becoming larger and larger • In most of the scientific workflows, we cannot ignore the time consumed int the intermediate data movement which possesses high portion of running time • Our Focus • Staging is our primary concern. • We seek a way to pipeline multiple interconnected tasks • Applications can benefit if the output of the prior task can be used by the posterior task once the data gets ready • In this paper • We consider several components to enable task pipelining • As a reference implementation, we propose PFS that supports various legacy applications without modification to the existing applications. • Our system can also be described in a workflow specification and thus, a user is able to construct a task pipelining framework without any further efforts except presenting a workflow specification for the PFS. DCSLab, SNU

  4. Motivating Application – HVEM Grid • HVEM (High Voltage Electronic Microscope) financially supported by the Ministry of Science and Technology in Korea. • HVEM has been installed in October, 2003, at the headquarter of Korea Basic Science Institute (KBSI), a nation user facility. • The main purpose is to offer a leading-edge analytical technology to researchers in diverse scientific fields. DCSLab, SNU

  5. HVEM Grid System • High Voltage Electron Microscope (HVEM) Grid system is a powerful tool designed upon the concepts of Grid and Web Service • To control instruments remotely • To manage and control 3-D processing of images • To store data automatically DCSLab, SNU

  6. Image Processing (G-Render) • Grid-based image processing system • 3-step image processing service: • 1) Image preprocessing • 2) Image alignment • 3) Tomogram generation • 4) Segmentation • Enabling high-performance image processing by utilizing the Grid to acquire unlimited computing power DCSLab, SNU

  7. Grid Workflow Management System Grid users Grid Workflow Application Modeling & Definition Tools Workflow Design & Definition Grid Information Services Build Time Grid Workflow Specification Resource Info Service Run Time Application Info Service Grid Workflow Enactment Service Workflow Execution & Control Workflow Scheduling Data Movement Fault Management Grid Middleware Interaction with Grid Resources Grid Resources DCSLab, SNU

  8. Design Consideration • Application-transparency • Supporting various legacy applications • Flexibility • Providing a general solution • Usability DCSLab, SNU

  9. Components Required • Workflow engine • If sufficient amount of data available, run next task immediately • Storage manager • Manage storage • Logical to physical mapping • Directory management • Advertise data availability to workflow engine • Physical storage • Store input/output files • Handle read/write operations DCSLab, SNU

  10. Reference Implementation • PIPE File System (PFS) consists of • PFS Manager (storage manager) • PFS Data Servers (physical storage) • PFS Library (user-transparency by FUSE) DCSLab, SNU

  11. PFS Manager • Resource Management • Storage management • PFS Data Server Maintenance • Logical to physical file mapping • Directory management • Single access point for the clients • Client can mount the PIPE File System manipulate the files as usual • Schedule Triggering • Advertise data availability to workflow engine DCSLab, SNU

  12. PFS Data Servers • Physical File Management • Store input/output files in its local storage • Serve read/write operations DCSLab, SNU

  13. PFS Library • User-level Library for Application • Two components • FUSE kernel module • Intercept I/O system call • Redirect the system call to the PFS Client • PFS Client • Interpret the I/O system call • Redirect the command to PFS Manager or PFS Data Server • Maintains the open file list DCSLab, SNU

  14. Integration Workflow scheduler enactment enactment metadata metadata p p PFS Manager fuse fuse data data PFS Data Server DCSLab, SNU

  15. Conclusion • We propose a task pipelining framework • Our system provides task pipelining in a form of a simple distributed file system • Triggering interface is used to enact next task DCSLab, SNU

More Related