1 / 17

PSWEEP: A Lightweight Pattern for Distributed Computational Experiments

PSWEEP: A Lightweight Pattern for Distributed Computational Experiments. Christopher Mueller and Andrew Lumsdaine Open Systems Lab, Indiana University. Introduction. Parameter Sweeps are common cluster applications Approaches Scripts (sh, perl: ssh, mpi)

Download Presentation

PSWEEP: A Lightweight Pattern for Distributed Computational Experiments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PSWEEP: A Lightweight Pattern for Distributed Computational Experiments Christopher Mueller and Andrew Lumsdaine Open Systems Lab, Indiana University

  2. Introduction • Parameter Sweeps are common cluster applications • Approaches • Scripts (sh, perl: ssh, mpi) • Low level applications (C++, Fortran: MPI) • Parameter sweep applications (e.g., Nimrod) • Problems • Custom solutions become tangled quickly • Applications are not available on all platforms

  3. Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- 882576.aviss.av silin iq SL_DBJ014Q 14636 2 4 -- 200:0 R 109:4 890917.aviss.av baikgrp bg DA_NPJ001V 27673 1 2 -- 168:0 R 83:32 890932.aviss.av baikgrp bg DA_NPJ002V 18006 1 2 -- 168:0 R 87:31 959929.aviss.av rllord iq RL1_NCQ02V 11982 1 2 -- 120:0 R 56:27 960044.aviss.av shawnli bg Hairy2b 13703 1 1 -- 100:0 R 42:52 960045.aviss.av shawnli bg Xxbp1 21294 1 1 -- 100:0 R 42:51 960046.aviss.av shawnli bg Foxa1 15908 1 1 -- 100:0 R 42:49 960047.aviss.av shawnli bg Foxa2 19881 1 1 -- 100:0 R 42:49 960048.aviss.av shawnli bg Foxd3 19073 1 1 -- 100:0 R 42:49 960050.aviss.av shawnli bg Gsc 20886 1 1 -- 100:0 R 42:04 960215.aviss.av shawnli bg Foxa1mamma 18296 1 1 -- 100:0 R 35:23 960216.aviss.av shawnli bg Foxa2mamma 14926 1 1 -- 100:0 R 34:43 960217.aviss.av shawnli bg Foxd3mamma 15016 1 1 -- 100:0 R 34:43 960218.aviss.av shawnli bg Gata4mamma 7421 1 1 -- 100:0 R 33:11 960220.aviss.av shawnli bg Glimammal 7525 1 1 -- 100:0 R 33:11 960221.aviss.av shawnli bg Gscmammal 16626 1 1 -- 100:0 R 33:03 960222.aviss.av shawnli bg Hairy2bmam 16760 1 1 -- 100:0 R 33:03 960224.aviss.av shawnli bg Hoxd1mamma 32101 1 1 -- 100:0 R 33:01 960225.aviss.av shawnli bg Mixermamma 27958 1 1 -- 100:0 R 32:09 960279.aviss.av dkberry mdgrape run13_07m 5570 1 1 -- 36:00 R 17:04 960283.aviss.av dbaronia iq batch.sh 23862 3 6 -- 24:00 R 22:41 960426.aviss.av cwillenb bg CWOA_005 18980 1 1 -- 100:0 R 04:52 960428.aviss.av cwillenb bg CWOA_006a 1941 1 1 -- 100:0 R 04:52 960429.aviss.av cwillenb bg CWOA_007 -- 1 1 -- 100:0 Q -- 960430.aviss.av cwillenb bg CWOA_008 -- 1 1 -- 100:0 Q -- 960431.aviss.av cwillenb bg CWOA_009 -- 1 1 -- 100:0 Q -- 960432.aviss.av cwillenb bg CWOA_010 -- 1 1 -- 100:0 Q -- 960433.aviss.av cwillenb bg CWOA_011 -- 1 1 -- 100:0 Q -- 960434.aviss.av cwillenb bg CWOA_012 -- 1 1 -- 100:0 Q -- 963115.aviss.av xsong bg par.241 -- 8 16 -- 24:00 Q -- 963116.aviss.av xsong bg par.242 -- 8 16 -- 24:00 Q -- 963121.aviss.av xsong bg par.53.7 -- 8 16 -- 02:00 Q -- 963122.aviss.av xsong bg par.53.8 -- 16 32 -- 02:00 Q -- 963133.aviss.av honfan iq HF_MJ370 23299 3 6 -- 120:0 R 07:13 963167.aviss.av whpitcoc iq WP_C572_L0 30829 1 2 -- 24:00 R 01:11 963171.aviss.av whpitcoc iq WP_C572_L0 17995 1 2 -- 24:00 R 01:11 963186.aviss.av whpitcoc iq WP_C572_TS 5235 1 2 -- 24:00 R 00:08 963187.aviss.av whpitcoc iq WP_C572_TS 25746 1 2 -- 24:00 R 00:09 963188.aviss.av whpitcoc iq WP_C572_TS 13846 1 2 -- 24:00 R 00:09 963189.aviss.av whpitcoc iq WP_C572_TS 26613 1 2 -- 24:00 R 00:08 Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- 882576.aviss.av silin iq SL_DBJ014Q 14636 2 4 -- 200:0 R 109:4 890917.aviss.av baikgrp bg DA_NPJ001V 27673 1 2 -- 168:0 R 83:32 890932.aviss.av baikgrp bg DA_NPJ002V 18006 1 2 -- 168:0 R 87:31 959929.aviss.av rllord iq RL1_NCQ02V 11982 1 2 -- 120:0 R 56:27 960044.aviss.av shawnli bg Hairy2b 13703 1 1 -- 100:0 R 42:52 960045.aviss.av shawnli bg Xxbp1 21294 1 1 -- 100:0 R 42:51 960046.aviss.av shawnli bg Foxa1 15908 1 1 -- 100:0 R 42:49 960047.aviss.av shawnli bg Foxa2 19881 1 1 -- 100:0 R 42:49 960048.aviss.av shawnli bg Foxd3 19073 1 1 -- 100:0 R 42:49 960050.aviss.av shawnli bg Gsc 20886 1 1 -- 100:0 R 42:04 960215.aviss.av shawnli bg Foxa1mamma 18296 1 1 -- 100:0 R 35:23 960216.aviss.av shawnli bg Foxa2mamma 14926 1 1 -- 100:0 R 34:43 960217.aviss.av shawnli bg Foxd3mamma 15016 1 1 -- 100:0 R 34:43 960218.aviss.av shawnli bg Gata4mamma 7421 1 1 -- 100:0 R 33:11 960220.aviss.av shawnli bg Glimammal 7525 1 1 -- 100:0 R 33:11 960221.aviss.av shawnli bg Gscmammal 16626 1 1 -- 100:0 R 33:03 960222.aviss.av shawnli bg Hairy2bmam 16760 1 1 -- 100:0 R 33:03 960224.aviss.av shawnli bg Hoxd1mamma 32101 1 1 -- 100:0 R 33:01 960225.aviss.av shawnli bg Mixermamma 27958 1 1 -- 100:0 R 32:09 960279.aviss.av dkberry mdgrape run13_07m 5570 1 1 -- 36:00 R 17:04 960283.aviss.av dbaronia iq batch.sh 23862 3 6 -- 24:00 R 22:41 960426.aviss.av cwillenb bg CWOA_005 18980 1 1 -- 100:0 R 04:52 960428.aviss.av cwillenb bg CWOA_006a 1941 1 1 -- 100:0 R 04:52 960429.aviss.av cwillenb bg CWOA_007 -- 1 1 -- 100:0 Q -- 960430.aviss.av cwillenb bg CWOA_008 -- 1 1 -- 100:0 Q -- 960431.aviss.av cwillenb bg CWOA_009 -- 1 1 -- 100:0 Q -- 960432.aviss.av cwillenb bg CWOA_010 -- 1 1 -- 100:0 Q -- 960433.aviss.av cwillenb bg CWOA_011 -- 1 1 -- 100:0 Q -- 960434.aviss.av cwillenb bg CWOA_012 -- 1 1 -- 100:0 Q -- 963115.aviss.av xsong bg par.241 -- 8 16 -- 24:00 Q -- 963116.aviss.av xsong bg par.242 -- 8 16 -- 24:00 Q -- 963121.aviss.av xsong bg par.53.7 -- 8 16 -- 02:00 Q -- 963122.aviss.av xsong bg par.53.8 -- 16 32 -- 02:00 Q -- 963133.aviss.av honfan iq HF_MJ370 23299 3 6 -- 120:0 R 07:13 963167.aviss.av whpitcoc iq WP_C572_L0 30829 1 2 -- 24:00 R 01:11 963171.aviss.av whpitcoc iq WP_C572_L0 17995 1 2 -- 24:00 R 01:11 963186.aviss.av whpitcoc iq WP_C572_TS 5235 1 2 -- 24:00 R 00:08 963187.aviss.av whpitcoc iq WP_C572_TS 25746 1 2 -- 24:00 R 00:09 963188.aviss.av whpitcoc iq WP_C572_TS 13846 1 2 -- 24:00 R 00:09 963189.aviss.av whpitcoc iq WP_C572_TS 26613 1 2 -- 24:00 R 00:08 How do we use our clusters?

  4. Anatomy of a Parameter Sweep Parameters and Enumeration Order * for i in range(rank, n, size): if process: load_image(i) elif stats: query_image(i) for j in [1, 2, 4, 8]: if process: time(i, j) for k in [‘motion’, ‘gaussian’]: if process: process_image(i,j,k) elif stats: image_stats(i,j,k) else: print'ssh n%d run %d %d' % (i, j, k) if process: clear_process(k) elif bgi: clear_temp(k) if process: unload_image(i) * Resrouce distribution is handled by the execution enviroment, e.g. mpirun

  5. Anatomy of a Parameter Sweep Tasks and Experiments for i in range(rank, n, size): if process: load_image(i) elif stats: query_image(i) for j in [1, 2, 4, 8]: if process: time(i, j) for k in [‘motion’, ‘gaussian’]: if process: process_image(i,j,k) elif stats: image_stats(i,j,k) else: print'ssh n%d run %d %d' % (i, j, k) if process: clear_process(k) elif bgi: clear_temp(k) if process: unload_image(i)

  6. Anatomy of a Parameter Sweep Artifacts and Errors for i in range(rank, n, size): if process: load_image(i) elif stats: query_image(i) for j in [1, 2, 4, 8]: if process: time(i, j) for k in [‘motion’, ‘gaussian’]: if process: process_image(i,j,k) elif stats: image_stats(i,j,k) else: print'ssh n%d run %d %d' % (i, j, k) if process: clear_process(k) elif bgi: clear_temp(k) if process: unload_image(i)

  7. process stats load_image() unload_image() query_image() image_stats() time() process_image() clear_process() ? Resources User’s View Experiments [0, n] [.01, .1, 1.0] script gen [10, 12, 14] print … [i, j, k] 0, 0.01, 10 0, 0.01, 12 0, 0.01, 14 0, 0.1, 10 0, 0.1, 12 … Parameters

  8. The PSWEEP Pattern

  9. Abstracting the Loops • Parameter. A Parameter is an iterator or container that supplies the values for a variable in the experiment. • Enumerator. The enumerator takes a ordered list of parameters and lexigraphically enumerates all possible values. • State. The state contains the current value of each parameter, in order. i = [‘house.jpg’, ‘lena.jpg’] j = [1, 2, 4, 8] K = [‘motion’, ‘gaussian’] params = [i, j, k] e = enumerator(params) for state in e: process_image(state)

  10. Abstracting the Experiments • Task. A Task is any unit of work performed when a parameter value changes. A Task is subdivided into setup and cleanup operations, corresponding to the work done at the beginning and end of a block of code in a loop, respectively. • Experiment. An Experiment is a collection of tasks. defPrepareImage(state, img): # Setup db_load(img, './current.jpg') yield# suspend the function # Cleanup delete('./current.jpg') defProcessImage(state, alg): data = load('./current.jpg') img = process(data, alg(value)) save(img, str(state) + '.jpg') return# no cleanup

  11. Binding Experiments to State Bound Task Semantics. Tasks must execute in the same order they would if the parameter sweep was expanded to nested loops. for img in images: PrepareImage.setup(img) for alg in algs: ProcessImage.setup(alg) PrepareImage.cleanup(img) e = enumerator([images, algs]) e.bind(images, PrepareImage) e.bind(algs, ProcessImage) for state in e: pass These examples are equivalent.

  12. Distributing the Workload DistributedEnumerator. DistributedEnumerator is an Enumerator that distributes the state to multiple instances across multiple computing resources. e = RoundRobin(params) for state in e: pass States: p1: [house.jpg, 1, motion] p2: [house.jpg, 1, gaussian] [house.jpg, 2, motion] [house.jpg, 2, gaussian] [house.jpg, 4, motion] [house.jpg, 4, gaussian] [lena.jpg, 1, motion] [lena.jpg, 1, gaussian] [lena.jpg, 2, motion] [lena.jpg, 2, gaussian] [lena.jpg, 4, motion] [lena.jpg, 4, gaussian] e = Domain(params, images) for state in e: pass States: p1: [house.jpg, 1, motion] [house.jpg, 1, gaussian] [house.jpg, 2, motion] [house.jpg, 2, gaussian] [house.jpg, 4, motion] [house.jpg, 4, gaussian] p2: [lena.jpg, 1, motion] [lena.jpg, 1, gaussian] [lena.jpg, 2, motion] [lena.jpg, 2, gaussian] [lena.jpg, 4, motion] [lena.jpg, 4, gaussian] e = MasterWorker(params) for state in e: pass States: p1: [house.jpg, 1, motion] p2: [house.jpg, 1, gaussian] [house.jpg, 2, motion] [house.jpg, 2, gaussian] [house.jpg, 4, motion] [house.jpg, 4, gaussian] [lena.jpg, 1, motion] [lena.jpg, 1, gaussian] [lena.jpg, 2, motion] [lena.jpg, 2, gaussian] [lena.jpg, 4, motion] [lena.jpg, 4, gaussian] The DistributedEnumerators must ensure that bound state semantics are satisfied.

  13. Implementations • Python • Designed around Iterators and Generators • DistribtedEnumerator based on pyMPI • Ideal for managing experiments on clusters • C++ • Template metaprogramming techniques remove abstraction penalties • Ideal for applications with many nested loops

  14. C++ Example Generate HTML tables for days of the week with hours for the rows and minutes for the colums Task Classes Parameter Sweep structtable_task { voidsetup(State& state) { std::cout << "<table title=\""; print_last_param()(state); std::cout << "\">\n"; } voidcleanup(State&) { std::cout << "</table>\n"; } }; structtable_row_task { // As above with <tr> }; structtable_data_task { // As above with <td> }; intmain() { usingboost::make_tuple; sweep(make_tuple("Sat", "Sun" make_tuple(range(24) make_tuple(range(0,60,10)))) empty_state(). bind<0>(table_task()). bind<1>(table_row_task()). bind<2>(table_data_task()), print_last_param()); return 0; }

  15. Conclusions • PSWEEP cleanly separates concerns • Parameters • Tasks • Resources • Modern languages enable flexible and high-performance implementations

  16. Reference A Lightweight Pattern for Managing Distributed Computational Experiments Christopher Mueller, Douglas Gregor, and Andrew Lumsdaine. Submitted to HPDC 2006. http://www.osl.iu.edu/~chemuell/new/psweep.php

  17. Questions?

More Related