1 / 33

Master/Worker and Condor Barcelona, 2006

Master/Worker and Condor Barcelona, 2006. Agenda. Extended user’s tutorial Advanced Uses of Condor Java programs DAGMan Stork MW Grid Computing Case studies, and a discussion of your application‘s needs. Why M aster W orker?. MW addresses a weakness in Condor: Short jobs

nate
Download Presentation

Master/Worker and Condor Barcelona, 2006

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Master/Worker and CondorBarcelona, 2006

  2. Agenda • Extended user’s tutorial • Advanced Uses of Condor Java programs DAGMan Stork MW Grid Computing • Case studies, and a discussion of your application‘s needs

  3. Why Master Worker? • MW addresses a weakness in Condor: Short jobs • Excellent for dynamic, parallel workflows

  4. A Workflow Problem A problem requires that we do A 60,000 times, and we do B 100,000 times • A takes 1 second • B takes 3 seconds Computation time for the problem is (60000 x 1) + (100000 x 3) = 360,000 seconds or 100 hours

  5. Condor Runs the Workflow Assume that the overhead Condor adds to running each instance of A or B is 20 seconds (this overhead is much too small) Time for Condor to do the problem is (60000 x 21) + (100000 x 23) = 3,560,000 seconds or 989 hours

  6. A Condor Job…

  7. An Often Considered Solution • Bundle several As or Bs into a single Condor job • Must address further issues: • Partial failures • Load balancing • Dynamic creation of work A A A One Condor job

  8. Basics of MW The master gives tasks to the workers.

  9. Workers and Tasks Each worker serially takes on tasks, as assigned by the master feed me one worker bathe me change diaper

  10. Relating MW to Condor • There is 1 master • The masterdetermines the number ofworkers • Each worker is a Condor job • Each worker receives tasks serially • Many workers do tasks at the same time (in parallel) • Workers communicate only with the master

  11. Solution: Lightweight TasksMultiplexed on top of Jobs The analogy: Process is to Thread as Condor Job is to an MW Task A Condor job may take minutes to create and dispatch; an MWTask dispatch takes milliseconds

  12. MW is • C++ Framework • A way to re-use Condor worker jobs • Each worker may run many tasks • Results in a very parallel application

  13. MW is not • MPI (Message Passing Interface) • General parallel programming scheme

  14. MW in action T Worker Master exe T T T T Worker T T T condor_submit Worker Submit machine

  15. You Must Write 3 Classes, the Subclasses of. . . MWDriver MWTask MWWorker Master exe Worker exe

  16. An MWTask • Subclass MWTask • Data members for inputs • Data member for results • Serialization of inputs and results • Distinct instances on each side

  17. The Four Task Methods • void MyTask::pack_work(void); • void MyTask::unpack_work(void); • void MyTask::pack_results(void); • void MyTask::unpack_results(void); • Also constructors and destructors!

  18. RMC • Resource Management and Communication • An abstraction to set up communication, to specify resource requirements, etc. • RMC->pack(int *array, int length); • RMC->unpack(int *array, int length);

  19. MWWorker • Just one method: executeTask(MWTask *t) • Also constructor and destructor!

  20. MWDriver (the master) • get_userinfo(int argc, char **argv) • RMC->add_executable(char *exe, char *requirements); • setup_initial_tasks(int num_tasks, MWTask ***init_tasks) • act_on_completed_task(MWTask *t) • RMC->add_task(MWTask *t) • Also constructor and destructor

  21. MWTask ***init_tasks pointer to the array array of pointers to tasks task

  22. MWDriver (the master) • get_userinfo(int argc, char **argv) • RMC->add_executable(char *exe, char *requirements); • setup_initial_tasks(int num_tasks, MWTask ***init_tasks) • act_on_completed_task(MWTask *t) • RMC->add_task(MWTask *t) • Also constructor and destructor

  23. Putting it all together:examples/new_skel • ./new_app MY_PROJECT A Perl script to create appropriately named files containing skeleton code • Use configure –help for options • make

  24. Running an application • Just launch the appropriate master • use condor_q to see it in action

  25. Real MW Applications • MWFATCOP (Chen, Ferris, Linderoth) A branch and cut code for linear integer programming • MWMINLP (Goux, Leyffer, Nocedal) A branch and bound code for nonlinear integer programming • MWQPBB (Linderoth) A (simplicial) branch and bound code for solving quadratically constrained quadratic programs • MWAND (Linderoth, Shen) A nested decomposition based solver for multistage stochastic linear programming • MWATR (Linderoth, Shapiro, Wright) A trust-region-enhanced cutting plane code for linear stochastic programming and statistical verification of solution quality. • MWQAP (Anstreicher, Brixius, Goux, Linderoth) A branch and bound code for solving the quadratic assignment problem

  26. Other resources • http://www.cs.wisc.edu/condor/mw • Online manual • MW-users mailing list

  27. Extra Slides

  28. Advice for Large Runs • Use Personal Condor • Flock, glidein, schedd-on-side, hobblein • Use checkpoints! • Set worker_increment high

  29. Debugging with Independent Mode • Special RMComm for debugging • Single process, can run under gdb

  30. MW Philosophy • Reuse either code or concept • Key idea: Late binding

  31. User-level Checkpoints • MWTask::write_chkpt_info(FILE *) • MWTask::read_chkpt_info(FILE *) • MWDriver::read_master_state(FILE *) • MWDriver::write_master_state(FILE *)

  32. Example codes with MW • Matmul • Blackbox • knapsack

  33. More on MW • http://www.cs.wisc.edu/condor/mw • Version 0.2 is the latest • It is more stable than the version number suggests! • Mailing list available for discussion • Active development by the Condor team

More Related