1 / 14

A tuple space based jobsubmission system for distributing short jobs

A tuple space based jobsubmission system for distributing short jobs. Peter Praxmarer GUP, Joh. Kepler University Linz praxmarer@gup.jku.at. Agenda. Architecture Outline Challenges imposed on the jobsubmission system by POP-C++

galena
Download Presentation

A tuple space based jobsubmission system for distributing short jobs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A tuple space based jobsubmission system for distributing short jobs Peter Praxmarer GUP, Joh. Kepler University Linz praxmarer@gup.jku.at

  2. Agenda • Architecture Outline • Challenges imposed on the jobsubmission system by POP-C++ • Challenges imposed on the jobsubmission system by the infrastructure • Requirements • Today’s system setup • Performance results Peter Praxmarer, GUP, Universität Linz

  3. Architecture Outline Peter Praxmarer, GUP, Universität Linz

  4. Challenges: POP-C++ (1) • Challenges imposed by the Nqueen app, POP-C++ • Performance • Assumes fast instantiation of so-called Remote Objects on the worker nodes. • Remote Objects are instantiated one after another • -> no bulk job submission of many Remote Objects at once possible. • -> Instead the creation of a single remote object corresponds to the submission of a single Grid-job. Peter Praxmarer, GUP, Universität Linz

  5. Challenges: POP-C++ (2) • Challenges imposed by the Nqueen app, POP-C++ (continued) • Need for Outbound connectivity • Fault-tolerance • The failure of a Grid-job would require complex error-handling within the POP-C++ library or within the application built on top of POP-C++. • Negatively impacts the performance. Peter Praxmarer, GUP, Universität Linz

  6. Challenges: gLite (1) • Challenges imposed by the grid infrastructure (gLite, ProActive VO) • Performance: • Delay of several minutes between the start of the submission of a gLite job and its start-up on a worker node • Reliability • High-rate of failed gLite jobs for various reasons Peter Praxmarer, GUP, Universität Linz

  7. Challenges: gLite (2) • Challenges imposed by the grid infrastructure (continued) • Dynamically changing environment • Machines are being added/removed • Heterogeneous Resources • IBM PPC cluster (not yet supported by gLite) • Non-gLite cluster at EIF Peter Praxmarer, GUP, Universität Linz

  8. Architecture Outline Peter Praxmarer, GUP, Universität Linz

  9. Requirements for the jobsubmission system • Has to be fast • job requests have to be fulfiled immediately if a resource is available • Has to be reliable • A high number of job requests must be processed within a short time frame • Has to cope with the dynamic arrival of new resources • Should ensure that resources being advertised as being available, indeed are capable of successfully executing the Remote Object. Peter Praxmarer, GUP, Universität Linz

  10. A Tuple Space based Jobsubmission System (1) • Based on ideas introduced in Carriero and Gelernter’s Linda system (1983) • A tuple space (TS) is a collection of tuples • A tuple is a data object containing one or more components. • E.g. (234, “command”), (234, 0, “result string”) • The TS is accessed through the following operations: insert, read, take Peter Praxmarer, GUP, Universität Linz

  11. A Tuple Space based Jobsubmission System (2) • Currently Client-Server architecture (central TS server, multiple clients) • Implemented in C++ • Network connectivity via TCP (w/o GSI) • ACL enabled • Provides a simple but fast form of resource brokering  Built to support the execution of short jobs Peter Praxmarer, GUP, Universität Linz

  12. IBM PPC@CSCS … crunchmachines@EIF (runs the NQueen App and some Worker) hydra @ GUP (hosts 2 TS) Today’s system setup ProActive VO LCG resources lcg-job-submit tisiphone@GUP (glite-UI) job-container started via ssh Peter Praxmarer, GUP, Universität Linz

  13. Performance Results • Within this set up: • Served up to 500 Job-Container • Served the same number of job requests • Time for executing a single job: min. 0.103 s, max 4.328 s (sample size 750 jobs) • Detailed performance analysis to be published later. Peter Praxmarer, GUP, Universität Linz

  14. Questions? Peter Praxmarer, GUP, Universität Linz

More Related