150 likes | 491 Views
A Task Pipelining Framework for e-Science Workflow Management Systems Hyeong S. Kim (hskim@dcslab.snu.ac.kr) In Soon Cho (ischo@dcslab.snu.ac.kr) Heon. Y. Yeom (yeom@snu.ac.kr) Dept. of Computer Science and Engineering Seoul National University Outline Introduction Motivation HVEM Grid
E N D
A Task Pipelining Framework for e-Science Workflow Management Systems Hyeong S. Kim (hskim@dcslab.snu.ac.kr) In Soon Cho (ischo@dcslab.snu.ac.kr) Heon. Y. Yeom (yeom@snu.ac.kr) Dept. of Computer Science and Engineering Seoul National University DCSLab, SNU
Outline • Introduction • Motivation • HVEM Grid • Proposing System • PIPE File System • Conclusion DCSLab, SNU
Introduction • Complex Scientific Workflow • Input/output data are becoming larger and larger • In most of the scientific workflows, we cannot ignore the time consumed int the intermediate data movement which possesses high portion of running time • Our Focus • Staging is our primary concern. • We seek a way to pipeline multiple interconnected tasks • Applications can benefit if the output of the prior task can be used by the posterior task once the data gets ready • In this paper • We consider several components to enable task pipelining • As a reference implementation, we propose PFS that supports various legacy applications without modification to the existing applications. • Our system can also be described in a workflow specification and thus, a user is able to construct a task pipelining framework without any further efforts except presenting a workflow specification for the PFS. DCSLab, SNU
Motivating Application – HVEM Grid • HVEM (High Voltage Electronic Microscope) financially supported by the Ministry of Science and Technology in Korea. • HVEM has been installed in October, 2003, at the headquarter of Korea Basic Science Institute (KBSI), a nation user facility. • The main purpose is to offer a leading-edge analytical technology to researchers in diverse scientific fields. DCSLab, SNU
HVEM Grid System • High Voltage Electron Microscope (HVEM) Grid system is a powerful tool designed upon the concepts of Grid and Web Service • To control instruments remotely • To manage and control 3-D processing of images • To store data automatically DCSLab, SNU
Image Processing (G-Render) • Grid-based image processing system • 3-step image processing service: • 1) Image preprocessing • 2) Image alignment • 3) Tomogram generation • 4) Segmentation • Enabling high-performance image processing by utilizing the Grid to acquire unlimited computing power DCSLab, SNU
Grid Workflow Management System Grid users Grid Workflow Application Modeling & Definition Tools Workflow Design & Definition Grid Information Services Build Time Grid Workflow Specification Resource Info Service Run Time Application Info Service Grid Workflow Enactment Service Workflow Execution & Control Workflow Scheduling Data Movement Fault Management Grid Middleware Interaction with Grid Resources Grid Resources DCSLab, SNU
Design Consideration • Application-transparency • Supporting various legacy applications • Flexibility • Providing a general solution • Usability DCSLab, SNU
Components Required • Workflow engine • If sufficient amount of data available, run next task immediately • Storage manager • Manage storage • Logical to physical mapping • Directory management • Advertise data availability to workflow engine • Physical storage • Store input/output files • Handle read/write operations DCSLab, SNU
Reference Implementation • PIPE File System (PFS) consists of • PFS Manager (storage manager) • PFS Data Servers (physical storage) • PFS Library (user-transparency by FUSE) DCSLab, SNU
PFS Manager • Resource Management • Storage management • PFS Data Server Maintenance • Logical to physical file mapping • Directory management • Single access point for the clients • Client can mount the PIPE File System manipulate the files as usual • Schedule Triggering • Advertise data availability to workflow engine DCSLab, SNU
PFS Data Servers • Physical File Management • Store input/output files in its local storage • Serve read/write operations DCSLab, SNU
PFS Library • User-level Library for Application • Two components • FUSE kernel module • Intercept I/O system call • Redirect the system call to the PFS Client • PFS Client • Interpret the I/O system call • Redirect the command to PFS Manager or PFS Data Server • Maintains the open file list DCSLab, SNU
Integration Workflow scheduler enactment enactment metadata metadata p p PFS Manager fuse fuse data data PFS Data Server DCSLab, SNU
Conclusion • We propose a task pipelining framework • Our system provides task pipelining in a form of a simple distributed file system • Triggering interface is used to enact next task DCSLab, SNU