1 / 75

PGENESIS Tutorial WAM-BAMM 05

PGENESIS Tutorial WAM-BAMM 05. Greg Hood Pittsburgh Supercomputing Center Carnegie Mellon University. Are your models running too slowly?. In some situations PGENESIS can be used to speed them up: Partitioning a large network across processors Running a large number of simulations

dana-mccall
Download Presentation

PGENESIS Tutorial WAM-BAMM 05

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PGENESIS TutorialWAM-BAMM 05 Greg Hood Pittsburgh Supercomputing Center Carnegie Mellon University

  2. Are your models running too slowly? In some situations PGENESIS can be used to speed them up: • Partitioning a large network across processors • Running a large number of simulations Not appropriate for: • Large single-cell models (i.e., those with many compartments)

  3. What is PGENESIS? • Library extension to GENESIS that supports • communication among multiple processes – • so nearly everything available in GENESIS is • available in PGENESIS • Allows multiple processes to perform multiple • simulations in parallel • Allows multiple processes to work together • cooperatively on a single simulation • Runs on workstations or supercomputers • using the PVM or MPI message-passing • libraries

  4. History • PGENESIS developed by Goddard and Hood at PSC (1993-1998) • Ported from PVM to MPI by Chukkpalli and Charman (NPACI, ~2000), and also by Panchev (Sunderland, ~2003) • Current contact: pgenesis@psc.edu

  5. Tutorial Outline • What PGENESIS provides • Using PGENESIS for parallel parameter searching • Using PGENESIS for simulating large networks more quickly • Selecting appropriate parallel hardware • Strategies for development and testing

  6. PGENESIS Functionality

  7. How PGENESIS Runs in Parallel (1) • PVM-based PGENESIS: typically one process starts and then spawns n-1 other processes • MPI-based PGENESIS: all n processes are started simultaneously by the mpirun or mpiexec command

  8. How PGENESIS Runs in Parallel (2) For both PVM and MPI-based versions: • mapping of processes to processors is nearly always 1 to 1 • mapping of processes to processors is often 1 to 1, but may be many to 1 during debugging • every process runs same script • this is not a real limitation

  9. Nodes and Zones • Each process is referred to as a "node". • Nodes may be organized into "zones". • A node is fully specified by a numeric string of the form “<node>.<zone>”. • Simulations within a zone are kept synchronized in simulation time. • Each node joins the parallel platform using the paron command. • Each node should gracefully terminate by calling paroff

  10. Every node in its own zone • Simulations on each node are not coupled temporally. • Useful for parameter searching. • We refer to nodes as “0.0”, “0.1”, “0.2”, …

  11. All nodes in one zone • Simulations on each node are coupled temporally. • Useful for large network models • Zone numbers can be omitted since we are dealing with only one zone; we can thus refer to nodes as “0”, “1”, “2”, …

  12. Nodes have distinct namespaces /elem1 on node 0 refers to an element on node 0 /elem1 on node 1 refers to an element on node 1 To avoid confusion we recommend that you use distinct names for elements on different nodes within a zone. The script writer (i.e., you) is responsible for partitioning a network model across nodes.

  13. GENESIS Terminology GENESISComputer Science Object = Class Element = Object Message = Connection Value = Message

  14. Who am I? PGENESIS provides several functions that allow a script to determine its place in the overall parallel configuration: mynode - # of this node in this zone nnodes - # of nodes in this zone (all numbering starts at 0) mytotalnode - # of this node in platform ntotalnodes - # of nodes in platform myzone - # of this zone nzones - # of zones npvmcpu - # of processors in configuration mypvmid - PVM task identifier for this node

  15. Styles of Parallel Scripts • Symmetric – Each node executes the same script commands in lock-step style (synchronized explicitly or implicitly). • Master/Worker – One node (usually node 0) coordinates processing and issues commands to the other nodes.

  16. Explicit Synchronization barrier - causes thread to block until all nodes within the zone have reached the corresponding barrier barrier -wait at default barrier barrier 7 -wait at named barrier barrier 7 100000 -timeout is 100000 seconds barrierall - causes thread to block until all nodes in all zones have reached the corresponding barrier barrierall -wait at default barrier barrierall 7 -wait at named barrier barrierall 7 100000 -timeout is 100000 sec

  17. Implicit Synchronization Two commands implicitly execute a zone-wide barrier: step - implicitly causes the thread to block until all nodes within the zone are ready to step (this behavior can be disabled with “setfield /post sync_before_step 0”) reset - implicitly causes the thread to block until all nodes have reset These commands require that all nodes in the zone participate, thus the barrier.

  18. Remote Function Calls (1) An "issuing" node directs a procedure to run on an "executing" node. Examples: some_function@2 params... some_function@all params... some_function@others params... some_function@0.4 params... some_function@1,3,5 params...

  19. Remote Function Calls (2) • Each remote function call causes the creation of a new thread on the executing node. • All parameters are evaluated on the issuing node. Example: if called from node 1, some_function@2 {mynode} will execute some_function 1 on node 2

  20. Remote Function Calls (3) When does the executing node actually perform the remote function call, since we don't use hardware interrupts? • While waiting at barrier or barrierall. • While waiting for its own remote operations to complete, e.g. func@node, raddmsg • When the simulator is sitting at the prompt waiting for user input. • When the executing script calls clearthread or clearthreads.

  21. Threads A thread is a single flow of control within a PGENESIS script being executed. • When a node starts, there is exactly one thread on it – the thread for the script. • There may potentially be many threads per node. These are stacked up, with only the topmost actually executing at any moment. clearthread – yield to one thread awaiting execution (if one exists) clearthreads – yield to all threads awaiting execution

  22. Asynchronous Calls (1) The async command allows a script to dispatch an operation on a remote node without waiting for its completion. Example: async some_function@2 params...

  23. Asynchronous Calls (2) One may wait for an async call to complete, either individually, future = {async some_function@2 ...} ... // do some work locally waiton {future} or for an entire set: async some_function@2 ... async some_function@5 ... ... waiton all

  24. Asynchronous Calls (3) Asynchronous calls may return a value. Example: int future = async myfunc@1 // start thread on node 1 … // do some work locally int result = waiton {future} // wait for thread's result Thus the term "future" - it is a promise of a value some time in the future. waiton calls in that promise.

  25. Asynchronous Calls (4) • async returns a value which is only to be used as the parameter of a waiton call, and waiton must only be called with such a value. • Remote function calls from a particular issuing node to a particular executing node are guaranteed to be performed in the sequence they were sent. • There is no guaranteed order among calls involving multiple issuing or executing nodes.

  26. Advice about Barriers (1) • It is very easy to reach deadlock if barriers are not handled correctly. PGENESIS tries to warn you by printing a message that it is waiting at a barrier. • Examples of incorrect barrier usage: • Each node executes: barrier {mynode} • Each node executes: barrier@all • A single node executes: barrier@others; barrier; However: async barrier@others; barrier will work!

  27. Advice about Barriers (2) • Guideline: if your script is operating in the symmetric style (all nodes execute all statements), never use barrier@ • If your script is operating in the master-worker style, master must ensure it calls a function on each worker that executes a barrier before the master itself enters the barrier • barrier; async barrier@others will not work.

  28. Commands for Network Creation Several new commands permit the creation of "remote" (internode) messages: raddmsg /local_element /remote_element@2 \ SPIKE rvolumeconnect /local_elements \ /remote_elements@2 \ -sourcemask ... -destmask ... \ -probability 0.5 rvolumedelay /local_elements -radial 10.0 rvolumeweight /local_elements -fixed 0.2 rshowmsg /local_elements

  29. Tips for Avoiding Deadlocks • Use lots of echo statements. • Use barrier IDs. • Do not execute barriers remotely (e.g., barrier@all). • Remember that step usually does an implicit barrier. • Have each node do its own step command, or have one controlling node do a step@all. (similarly for reset) • Do not use the stop command. • Keep things simple.

  30. Motivation • Parallel control of setup can be hard. • Parallel control of simulation can be hard. • Debugging parallel scripts is hard.

  31. How PGENESIS Fits into Schedule • Schedule controls the order in which GENESIS elements get updated. • At beginning of step, all internode data is transferred. • There will be equivalence to serial GENESIS only if remote messages do not pass from earlier to later elements in the schedule.

  32. How PGENESIS Fits into Schedule addtask Simulate /##[CLASS=postmaster] -action PROCESS addtask Simulate /##[CLASS=buffer] -action PROCESS addtask Simulate /##[CLASS=projection] -action PROCESS addtask Simulate /##[CLASS=spiking] -action PROCESS addtask Simulate /##[CLASS=gate] -action PROCESS addtask Simulate /##[CLASS=segment][CLASS!=membrane]\ [CLASS!=gate][CLASS!=concentration] -action PROCESS addtask Simulate /##[CLASS=membrane] -action PROCESS addtask Simulate /##[CLASS=hsolver] -action PROCESS addtask Simulate /##[CLASS=concentration] \ -action PROCESS addtask Simulate /##[CLASS=device] -action PROCESS addtask Simulate /##[CLASS=output] -action PROCESS

  33. “Hello, world!” for PGENESIS Contents of file hello.g: paron –parallel –nodes 4 –output hello.out barrier 17 echo “Hello from node “ {mynode} barrier 18 paroff Execute on four nodes with: pgenesis –nox hello.g

  34. Parameter Searching with PGENESIS

  35. Model Characteristics The following are prerequisites to use PGENESIS for optimization on a particular parameter searching problem: • Model must be expressed in GENESIS. • Decide on the parameter set. • Have a way to evaluate the parameter set. • Have some range for each of the parameter values. • The evaluations over the parameter-space should be reasonably well-behaved. • Stopping criterion

  36. Choose a Search Strategy • Genetic Search • Simulated Annealing • Monte Carlo (for very ill-behaved search spaces) • Nelder-Mead (for well-behaved search spaces) • Use as many constraints as you can to restrict the search space • Always do a sanity check on results

  37. An Example Model • We have a one compartment cell model of a spiking neuron. Dynamics are well-behaved. • Parameters are the conductances for the Na, Kdr, Ka, and KM channels. We know the conductance values to be in the range from 0.1 to 10.0 a priori. • We write spike times to a file, then compare this using a C function, spkcmp, to "experimental" data. • Stop when our match fitness exceeds 20.0 param2

  38. A Parallel Genetic Algorithm • We adopt a population-based approach as opposed to a generation-based one. • We will keep a fixed population "alive" and use the workers to evaluate the fitness of candidate individuals. • If a candidate turns out to be better than some member of the current population, then we replace the worst member of the current population with the new individual.

  39. Mutations • Pick a member of the population at random. • Decide whether to do crossover according to the crossover probability. If we are doing crossover, pick another random member of the current population, and combine the "genes" of those individuals. If we aren't doing crossover, just copy the bits of the original individual. • Go through each bit of the bit string, and mutate it with some small probability.

  40. Master/Worker Paradigm (1)

  41. Master/Worker Paradigm (2) • All nodes in a separate zone. • Node 0.0 will control the search. • Nodes 0.1 through 0.{n-1} will run the model and perform the evaluation.

  42. Commands for Optimization Typically these are organized in a master/worker fashion with one node (the master) directing the search, and all other nodes evaluating parameter sets. Remote function calls are useful in this context for: • sending tasks to workers: async task@{worker} param1... • having workers return evaluations to master: return_result@{master} result

  43. Main Script paron -farm -silent 0 -nodes {n_nodes} \ -output o.out -executable nxpgenesis barrierall if ({mytotalnode} == 0) init_master pb_search {individuals} {population} else init_worker end barrierall 7 1000000 paroff

  44. Master Conducts the Search function pb_search ... for (i = 0; i < individuals && \ max_fitness < stopping_criterion; \ i = i + 1) // pick random individual from population // decide whether to do crossover mutation // mutate bitstring // assign this task to a worker delegate_task (i) end finish print_results end

  45. Master Conducts the Search function delegate_task ... // send the parameters one by one for (p = 0; p < parameters; p = p + 1) async set_param@0.{try_node} \ {p} {getfield \ /params[{p}] bits} end async worker_task@0.{try_node} {index} clearthreads ... end

  46. Worker Evaluates Individuals (1) function worker_task (index) compute_parameter_values // determine that fitness value for // this individual fit = {evaluate} // return result to the master return_result@0.0 {mytotalnode} \ {index} {fit} end

  47. Worker Evaluates Individuals (2) function evaluate float match, fitness // first run the simulation newsim {getfield /params[0] value} \ {getfield /params[1] value} \ {getfield /params[2] value} \ {getfield /params[3] value} runfI call /out/{sim_output_file} FLUSH

  48. Worker Evaluates Individuals (3) // then find the simulated spike times gen2spk {sim_output_file} {delay} \ {current_duration} {total_duration} // then compare the simulated spike // times with the experimental data match = {spkcmp {real_spk_file} \ {sim_spk_file} -pow1 0.4 -pow2 0.6 \ -msp 0.5 -nmp 200.0} fitness = 1.0 / {sqrt {match}} return {fitness} end

  49. Master Integrates the Results function return_result (node, index, fit) ... end

  50. Comparison of Parallel Parameter Search with Serial Parameter Search • GA scales fairly well • SA scales to a certain extent, but not as well as GA • paths through search space will be different, but if searches are successful, they will converge to the same result

More Related