1 / 20

BPEL4Job: a Fault-handling Design for Job Flow Management

BPEL4Job: a Fault-handling Design for Job Flow Management. Wei Tan 1 , Liana Fong 2 , Norman Bobroff 2 1 Dept. Automation, Tsinghua University, Beijing, China 2 IBM T. J. Watson Research Center, Hawthorne, USA tanwei@mails.tsinghua.edu.cn llfong@us.ibm.com , bobroff@us.ibm.com. Agenda.

langer
Download Presentation

BPEL4Job: a Fault-handling Design for Job Flow Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BPEL4Job: a Fault-handling Design for Job Flow Management Wei Tan1, Liana Fong2, Norman Bobroff2 1 Dept. Automation, Tsinghua University, Beijing, China 2 IBM T. J. Watson Research Center, Hawthorne, USA tanwei@mails.tsinghua.edu.cn llfong@us.ibm.com, bobroff@us.ibm.com

  2. Agenda • 1 Introduction • 2 BPEL4Job: a fault-handling design for job flow management • 3 Integrating fault-handling policies with job flow modeling • 4 Fault-handling at the flow execution layer • 5 Implementation and sample application • 6 Conclusion and ongoing & future work

  3. 1 Introduction: Motivation • Job flow is especially relevant in orchestrating batch jobs • Enforce job execution sequence • Manage job execution trace • Handle run-time fault in flow level • Various languages & systems have been devised • DAGMan/Condor, Taverna/myGrid, Job Stream/Tivoli- Workload Scheduler, JobCommand/Tivoli-LoadLeveler • BPEL-based job flow management is attracting more attention • Resource and applications are becoming service-oriented • Requirement to combine business process (including human tasks) with back-end batch jobs • BPEL as a framework on flow orchestration, data manipulation, fault handling, and could be extended or enhanced • BPEL is supported by industry and open source community

  4. 1 Introduction: Challenges • The use of BPEL for job flow is not without technical challenges • Defining a job entity • BPEL does not support using JSDL or other job specification languages • Supporting data flow and dependencies • Data staging in/out • Incorporating the asynchronous interaction with schedulers • Usually job scheduler reports job status in an asynchronous manner • Incorporating fault tolerance and recovery strategy in job flow • Job flow has special requirement on fault handling, like re-try and re-submit • Supporting dynamic changes of flow instances • In case that flow execution logic could not be fully anticipated in-advance.

  5. 1 Introduction: BPEL4Job • The goal of BPEL4Job • A BPEL-based job flow system with fault-handling capability • Challenges addressed • How to communicate with job schedulers? • A generic job proxy to facilitate the asynchronous job submission and job status notification • How to model a job flow with fault-handling capability? • A policy-based, two-stage approach • How to enforce various fault-handling policies at run-time? • A set of fundamental fault-handling schemes, especially, including instance migration between flow engines

  6. Flow modeling layer Stage 1: define base flow, job definitions, the fault-handling policies. Stage 2, generate expanded flow. Flow execution layer Flow engine Job proxy Fault-handling service Job scheduling layer Job schedulers 2 BPEL4Job: a fault-handling design for job flow management

  7. 3 Integrating fault-handling policies with job flow modeling • BPEL4Job considers three kinds of policies • Cleanup • generate fault report and delete the instance data in flow engine. • Re-try • re-execute the job in the same engine. • Re-submit • Export flow instance state • Restore flow instance in a different engine, such that the flow can resume from the failed job • More policies could be defined and implemented based on the three fundamental policies • Rollback, alternate job, etc.

  8. 3 Integrating fault-handling policies with job flow modeling The re-try policy The re-submit policy The base flow with policies embedded

  9. 3 Integrating fault-handling policies with job flow modeling Expanded flow Base flow The transformation to implement the re-try policy of Job1

  10. 4 Fault-handling at the flow execution layer • We leverage: • BPEL fault-handling construct: Catch, CatchAll • We enhance • Specific capabilities to recognize job failures and to handle faults according to defined policies. • Components in this layer • The generic job proxy for job submission and job status notification • The fault-handling service to enforce the policies defined in flow modeling layer

  11. Generic job proxy Receives a job submission request. Forwards the request to a scheduler, and start to listen for the job state notification from it. For notification indicating job success/failure, forwards to flow engine and returns; otherwise continue listening. The generic job proxy

  12. Fault-handling schemes in flow execution

  13. Flow re-submission and instance migration • Extract all the information related to a BPEL instance. • Re-shape the instance data and migrate it into another WPS engine.

  14. Implementation Websphere Integration Developer (WID) Websphere Process Server (WPS) Tivoli Dynamic Workload Broker (ITDWB)

  15. Sample Montage Job Flow • Montage: atoolkit for assembling raw astronomy images into custom mosaics. • Developed by NASA & California Institute of Technology. • The assembling process is usually expressed as a job flow. Generate Image table Image projection in parallel raw images Generate Mosaic Transform to jpeg Generate Image table

  16. Montage job flow and the re-start policy Base flow Expanded flow (partial) Policy says: re-submit from mImgtbl1 when mAdd1 fails

  17. Instance migration from saba10 to weitan (a) Montage initiated & failed at saba10 (b) Montage migrated to weitan (c) Montage re-started and completed at weitan

  18. Conclusion • BPEL4Job: the exploration of using BPEL as a job flow language • A two-stage approach for job flow modeling with fault-handling policies • A generic job proxy to facilitate the asynchronous nature of job submission and job status notification • A set of fundamental fault-handling schemes, including instance migration between flow engines • Future work • Support more complicated fault-handling policies • Involving Human Task, expressed asbusiness rules, etc • Apply instance migration technique in • Load balance between flow engines • Instance migration to newer version

  19. Future work

  20. Thank you for your attention. Please contact me at: Dept. Automation, Tsinghua Univ, Beijing, China http://twtanwei.googlepages.com twtanwei@gmail.com

More Related