• 240 likes • 381 Views
Master-Worker Tutorial Condor Week 2006. Agenda. What is M-W When to use M-W How to build a simple M-W application Q & A. Why M-W?. M-W addresses a weakness in Condor: Short jobs Also, for dynamic, parallel workflows. A Condor Job…. An easy solution:.
E N D
Agenda • What is M-W • When to use M-W • How to build a simple M-W application • Q & A
Why M-W? • M-W addresses a weakness in Condor: • Short jobs • Also, for dynamic, parallel workflows
An easy solution: • Why not just wrap up smaller jobs into a bigger Condor job? • Partial failures? • Load balancing? • Dynamic creation of work?
Solution: Lightweight TasksMultiplexed on top of Jobs • Process : Thread :: Condor Job : MW Task • MWTask dispatch in milliseconds, Condor job can take minutes
MW is… • C++ Framework • To re-use condor worker jobs • To each run many tasks • Results in very parallel application
MW is not • MPI • General parallel programming scheme
MW in action T Worker Master exe T T T T T T T T T Worker T condor_submit Worker Submit machine
You Must Write 3 Classes Subclasses of … MWDriver MWTask MWWorker Master exe Worker exe
Your_MWTask • Subclass MWTask • Data members for inputs • Data member for results • Serialization of inputs and results • Distinct instances on each side
The Four Task Methods • void MyTask::pack_work(void); • void MyTask::unpack_work(void); • void MyTask::pack_results(void); • void MyTask::unpack_results(void); • Also ctor/dtor!
RMComms • Abstraction for communication • (and some other stuff…) • RMC->pack(int *array, int length); • RMC->unpack(int *array, int length);
MWWorker • Just one method: • executeTask(MWTask *t) • Also ctor/dtor!
MWDriver • get_userinfo(int argc, char **argv) • RMC->add_executable(char *exe, char *requirements); • setup_initial_tasks(int num_tasks, MWTask ***init_tasks) • act_on_completed_task(MWTask *t) • RMC->add_task(MWTask *t) • Also ctor/dtor
Putting it all together:new_skel • ./new_skel MY_PROJECT • Use configure –help for options • make
Debugging with Independent Mode • Special RMComm for debugging • Single process, can run under gdb
Running on the Grid… • Just launch the appropriate master • condor_q to see it in action
Advice for Large Runs • Use personal condor • Flock, glide-in, schedd-on-side, hobblein • Use checkpointing! • Set_worker_increment high
User-level Checkpointing • MWTask::write_chkpt_info(FILE *) • MWTask::read_chkpt_info(FILE *) • MWDriver::read_master_state(FILE *) • MWDriver::write_master_state(FILE *)
Example codes with MW • Matmul • Blackbox • knapsack
MW Philosophy • Reuse either code or concept • Key idea: Late binding
Other resources • http://www.cs.wisc.edu/condor/mw • Online manual • MW-users mailing list
Thank You! Questions? MW Home page: http://www.cs.wisc.edu/condor/mw