240 likes | 269 Views
Explore the collaborative effort between Geant4, DIANE, and Anaphe in parallelizing Geant4 simulation to optimize detectors for x-ray fluorescence emission from Mercury's crust. Learn about increasing performance through parallel simulation, Performance Increase Benchmarking, scalability tests, and overcoming challenges for reproducibility and minimizing deployment overhead. Discover DIANE's role in semi-interactive parallel analysis and evaluation of middleware technologies for LHC.
E N D
Distributed Simulation with Geant4 Preliminary results of the LowE / DIANE joint projectJakub T. Moœcicki, CERN/IT credits also to: Alfonso Mantero, INFN Genova
History • Parallelization of Geant4 simulation is a joint project between Geant4 – DIANE – Anaphe • DIANE is an R&D project in IT/API to study distributed analysis and simulation and create a prototype • initiated early 2001 with very limited resources • Anaphe is an analysis project supported by IT • provides the analysis framework for HEP • The pilot programme includes G4 simulation which produces AIDA/Anaphe histograms • Collaboration started late spring 2002
Sequential Geant4 Simulation • the goal of simulation: • optimize the detectors used for x-ray fluorescence emission from Mercury's crust in the context of Hermes, Bepi Colombo ESA mission. • requires high statistics è many events • 20 Mio events ~ 3 hours • up to 100 Mio events might be useful • estimated time ~16 hours
Parallel Geant4 Simulation • increase performance • shift from batch to semi-interactive simulation • speed up the analysis cycle • generate more events – debug simulation faster • from sequential to parallel simulation • preserve reproducability of the results • minimize deployment overhead • when moving from sequential to parallel simulation • both in terms of time and amout of code/expertise one must invest
Benchmarking environment • parallel cluster configuration • lxplus: 70 redhat 61 nodes • 7 Intel STL2 (2 x PIII 1GHz, 512MB) • 31 ASUS P2B-D ( 2 x PIII 600MHz, 512MB) • 15 Celsius 620 (2 x PIII, 550MHz, 512MB) • the rest – Kayak 450 Mhz (2 x PIII, 450Mhz, 128MB) • reference sequential machine • pcgeant2 (2x Xeon 1700Mhz, 1GB)
Benchmarking Caveat • non-exclusive access to interactive machines • 'load-noise' background, unpredictible load peaks • different CPU and RAM on nodes • AFS used to fetch physics config data • try to remove the noise: • repeat simulations many times to get the correct mean • work at night and off-peak hours (what about US people using CERN computing facilities ?) • etc... • conclusion: • results should be taken with caution and are approximate
Structure of the simulation • initialization phase (constant) • load ~10-15 Mb of physics tables, config data etc. • reference sequential machine: ~ 4 minutes (user time) • cluster nodes: ~ 5-6 minutes • beamOn ~ f( event number ) • small job: 1-5 Mio events • medium job: 20-40 Mio events • big job: > 50 Mio events
Benchmarking (comments) • results are approximate • scaling factors for different CPU speeds • but seem with agreement with expectations • move from batch to semi interactive simulation feasible • small jobs do not gain so much – large constant initialization time
Problems & solutions • time of job execution = slowest machine... • ...or most loaded one at the moment • often had to wait a long time for last worker to finish • possible solution: • use larger number of smaller workers • fast machines run workers sequentially many times, but... • constant initialization time rather important • initialize once, beamOn many times... to be checked • if this problem is solved we may move towards more interactive simulation
Reproducability • initial seed of the random engine • make sure that every parallel simulation starts with a seed uniquely determined by the job's initial seed • number of times engine is used depends on the initial seed • make sure that correlations between the workers' seeds are avoided • our solution: • use two uncorrelated random engines • one to generate a table of initial seeds (one seed for each worker) • another for the simulation inside the worker
Reproducability • parameters which need to be fixed to reproduce the simulation: • total number of events • initial seed • ... but also: • number of workers • number of events per worker
Ease of use • user-friendliness • G4 simulation developer should not need to fight with irrelevant technical problems when moving from sequential to parallel G4 simulation • as non-intrusive as possible • minimize necessary code changes in original simulation • good separation of the subsystems • G4 simulation does not need to know that it runs in parallel... • the distributed framework (DIANE) does not need to care about what actually is being simulated (see #Slide 20)
What is DIANE? R&D project in IT/API semi-interactive parallel analysis for LHC middleware technology evaluation & choice CORBA, MPI, Condor, LSF... also see how to integrate API products with GRID prototyping (focus on ntuple analysis) time scale and resources: Jan 2001: start (< 1 FTE) June 2002: running prototype exists sample Ntuple analysis with Anaphe event-level parallel Geant4 simulation
What is DIANE? frameworkfor parallel cluster computation application-oriented master-worker model common in HEP applications application-independent apps dynamically loaded in a plugin style callbacks to applications via abstract interfaces component-based subsystems and services packaged into component libraries core architecture uses CORBA and CCM (CORBA Component Model ) integration layer between applications and the GRID environment and deployment tools
Master/Worker model applications share the same computation model so also share a big part of the framework code but have different non-functional requirements CPU vs IO intensive semi-interactive vs batch etc....
What DIANE is not DIANE is not a replacement for a GRID and its services a hardwired analysis toolkit
DIANE and GRID DIANE as a GRID computing element ...via a gateway that understands Grid/JDL ... Grid/JDL must be able to descibe parallel jobs/tasks DIANEas a user of (low level) Grid services ...authentication, security, load balancing... and profit from existing 3rd party implementations python environment is a rapid prototyping platform and may provide a convinient connection between DIANE and Globus Toolkit via pyGlobus API
Architecture Overview layering: abstract middleware interfaces and components plugin-style application loading
Conclusions • prototype deployment of G4-DIANE • significant performance improvement possible • scalability tests: • 140 Mio Events • 70 nodes in the cluster • 1 hour total parallel execution • putting together DIANE and G4 is fairly easy • done in several days... • DIANE may bridge G4 to the GRID world • without necessarily waiting for fully-fledged GRID infrastructure to become available