100 likes | 295 Views
Parallelization. Geant4 simulation is an embarrassingly parallel computational problem – each event can possibly be treated independently. Why parallelize. Monte-Carlo simulation is a computationally demanding problem . For good statistics need large number of primary events
E N D
Parallelization • Geant4 simulation is an embarrassingly parallel computational problem – each event can possibly be treated independently
Why parallelize • Monte-Carlo simulation is a computationally demanding problem. • For good statistics need large number of primary events • Each primary may generate large numbers of secondaries • Complex and large detector setups – from the LHC to Earth or space • Simply increasing the GHz count on CPUs has reached its limits, more practical (thermally and cost-wise) are multicore, multi-processor and cluster-architectures • Just another form of divide and conquer: if the problem is to large to solve it as a whole, sub-divide it.
How to parallelize? • Simplest solution: run the same application on multiple cores/computers with different initial parameters in parallel. • Few to no adaptions of application necessary • Overhead because resource sharing is not used • Need additional tools to collect results • Intermediate solution: integrate task scheduling into application. It is still run as a complete process but is aware that it may share resource and need to collect results. • Modification of application necessary • Processes still generally run in their own memory space • Result collection implemented in application
How to parallelize? • Multi-Threaded • Less overhead because resources are shared (only one process) • Need to take great care to not have race conditions (e.g. on thread writes into memory another thread reads) • Bound to local machine • Multi-Threaded and Distributed • Requires extensive adaption of application • Message passing e.g. with ZeroMQ needed • Data is passed as message, which may not be replied to • Message are passed transparently over network or memory • Very powerful and scalable, nowadays often runs into I/O-bound limits.
Application-level parallelization • Currently available concurrency solutions in Geant4: • Parallelize on a run level, i.e. run multiple full simulations with different random seeds, then combine results afterwards • Not really concurrent Geant4, but concurrently run Geant4 • User needs to take care of result composition afterwards • Overhead from full Geant4 application instantiated for each simulation (process vs. thread) • Nevertheless: straight-forward, minimal adaption of existing simulation, useful for great variety of parallel architectures //in application main G4long seed = time(NULL); // get a “unique” seed CLHEP::HepRandom::setTheSeed(seed); //set this seed for the RNG
Process-level Parallelization • Parallelize on an per-event level using ParGeant4 • Replaces run manager with a parallel version which uses TOP-C based messaging to launch slave processes • Marshaling of hit data has to be done by user using specific comments, i.e. which data can be copied, how should it be copied. • ParGeant4 takes care of aggregating results of parallel processing into sequential containers • Can parallelize on multiple cores and multiple machines • Each slave is essentially a complete Geant4 application in itself • No thread level parallelism
Parallelization Hints • If you do not want to have to think about sharing memory, marshaling data or where to modify your code: run parallel application. • A simple bash script can start up multiple simulations • Depending on how you persist your results they can often easily be combined by pure concatenation. • Using ParGeant4 is more elaborate but also requires more code modification, i.e. there are more banana skins along the way.