200 likes | 334 Views
BPEL4Job: a Fault-handling Design for Job Flow Management. Wei Tan 1 , Liana Fong 2 , Norman Bobroff 2 1 Dept. Automation, Tsinghua University, Beijing, China 2 IBM T. J. Watson Research Center, Hawthorne, USA tanwei@mails.tsinghua.edu.cn llfong@us.ibm.com , bobroff@us.ibm.com. Agenda.
E N D
BPEL4Job: a Fault-handling Design for Job Flow Management Wei Tan1, Liana Fong2, Norman Bobroff2 1 Dept. Automation, Tsinghua University, Beijing, China 2 IBM T. J. Watson Research Center, Hawthorne, USA tanwei@mails.tsinghua.edu.cn llfong@us.ibm.com, bobroff@us.ibm.com
Agenda • 1 Introduction • 2 BPEL4Job: a fault-handling design for job flow management • 3 Integrating fault-handling policies with job flow modeling • 4 Fault-handling at the flow execution layer • 5 Implementation and sample application • 6 Conclusion and ongoing & future work
1 Introduction: Motivation • Job flow is especially relevant in orchestrating batch jobs • Enforce job execution sequence • Manage job execution trace • Handle run-time fault in flow level • Various languages & systems have been devised • DAGMan/Condor, Taverna/myGrid, Job Stream/Tivoli- Workload Scheduler, JobCommand/Tivoli-LoadLeveler • BPEL-based job flow management is attracting more attention • Resource and applications are becoming service-oriented • Requirement to combine business process (including human tasks) with back-end batch jobs • BPEL as a framework on flow orchestration, data manipulation, fault handling, and could be extended or enhanced • BPEL is supported by industry and open source community
1 Introduction: Challenges • The use of BPEL for job flow is not without technical challenges • Defining a job entity • BPEL does not support using JSDL or other job specification languages • Supporting data flow and dependencies • Data staging in/out • Incorporating the asynchronous interaction with schedulers • Usually job scheduler reports job status in an asynchronous manner • Incorporating fault tolerance and recovery strategy in job flow • Job flow has special requirement on fault handling, like re-try and re-submit • Supporting dynamic changes of flow instances • In case that flow execution logic could not be fully anticipated in-advance.
1 Introduction: BPEL4Job • The goal of BPEL4Job • A BPEL-based job flow system with fault-handling capability • Challenges addressed • How to communicate with job schedulers? • A generic job proxy to facilitate the asynchronous job submission and job status notification • How to model a job flow with fault-handling capability? • A policy-based, two-stage approach • How to enforce various fault-handling policies at run-time? • A set of fundamental fault-handling schemes, especially, including instance migration between flow engines
Flow modeling layer Stage 1: define base flow, job definitions, the fault-handling policies. Stage 2, generate expanded flow. Flow execution layer Flow engine Job proxy Fault-handling service Job scheduling layer Job schedulers 2 BPEL4Job: a fault-handling design for job flow management
3 Integrating fault-handling policies with job flow modeling • BPEL4Job considers three kinds of policies • Cleanup • generate fault report and delete the instance data in flow engine. • Re-try • re-execute the job in the same engine. • Re-submit • Export flow instance state • Restore flow instance in a different engine, such that the flow can resume from the failed job • More policies could be defined and implemented based on the three fundamental policies • Rollback, alternate job, etc.
3 Integrating fault-handling policies with job flow modeling The re-try policy The re-submit policy The base flow with policies embedded
3 Integrating fault-handling policies with job flow modeling Expanded flow Base flow The transformation to implement the re-try policy of Job1
4 Fault-handling at the flow execution layer • We leverage: • BPEL fault-handling construct: Catch, CatchAll • We enhance • Specific capabilities to recognize job failures and to handle faults according to defined policies. • Components in this layer • The generic job proxy for job submission and job status notification • The fault-handling service to enforce the policies defined in flow modeling layer
Generic job proxy Receives a job submission request. Forwards the request to a scheduler, and start to listen for the job state notification from it. For notification indicating job success/failure, forwards to flow engine and returns; otherwise continue listening. The generic job proxy
Flow re-submission and instance migration • Extract all the information related to a BPEL instance. • Re-shape the instance data and migrate it into another WPS engine.
Implementation Websphere Integration Developer (WID) Websphere Process Server (WPS) Tivoli Dynamic Workload Broker (ITDWB)
Sample Montage Job Flow • Montage: atoolkit for assembling raw astronomy images into custom mosaics. • Developed by NASA & California Institute of Technology. • The assembling process is usually expressed as a job flow. Generate Image table Image projection in parallel raw images Generate Mosaic Transform to jpeg Generate Image table
Montage job flow and the re-start policy Base flow Expanded flow (partial) Policy says: re-submit from mImgtbl1 when mAdd1 fails
Instance migration from saba10 to weitan (a) Montage initiated & failed at saba10 (b) Montage migrated to weitan (c) Montage re-started and completed at weitan
Conclusion • BPEL4Job: the exploration of using BPEL as a job flow language • A two-stage approach for job flow modeling with fault-handling policies • A generic job proxy to facilitate the asynchronous nature of job submission and job status notification • A set of fundamental fault-handling schemes, including instance migration between flow engines • Future work • Support more complicated fault-handling policies • Involving Human Task, expressed asbusiness rules, etc • Apply instance migration technique in • Load balance between flow engines • Instance migration to newer version
Thank you for your attention. Please contact me at: Dept. Automation, Tsinghua Univ, Beijing, China http://twtanwei.googlepages.com twtanwei@gmail.com