240 likes | 419 Views
Distributed Simulation with Geant4 Preliminary results of the LowE / DIANE joint project Jakub T. Moœcicki, CERN/IT credits also to: Alfonso Mantero, INFN Genova. History. Parallelization of Geant4 simulation is a joint project between Geant4 – DIANE – Anaphe
E N D
Distributed Simulation with Geant4 Preliminary results of the LowE / DIANE joint projectJakub T. Moœcicki, CERN/IT credits also to: Alfonso Mantero, INFN Genova
History • Parallelization of Geant4 simulation is a joint project between Geant4 – DIANE – Anaphe • DIANE is an R&D project in IT/API to study distributed analysis and simulation and create a prototype • initiated early 2001 with very limited resources • Anaphe is an analysis project supported by IT • provides the analysis framework for HEP • The pilot programme includes G4 simulation which produces AIDA/Anaphe histograms • Collaboration started late spring 2002
Sequential Geant4 Simulation • the goal of simulation: • optimize the detectors used for x-ray fluorescence emission from Mercury's crust in the context of Hermes, Bepi Colombo ESA mission. • requires high statistics è many events • 20 Mio events ~ 3 hours • up to 100 Mio events might be useful • estimated time ~16 hours
Parallel Geant4 Simulation • increase performance • shift from batch to semi-interactive simulation • speed up the analysis cycle • generate more events – debug simulation faster • from sequential to parallel simulation • preserve reproducability of the results • minimize deployment overhead • when moving from sequential to parallel simulation • both in terms of time and amout of code/expertise one must invest
Benchmarking environment • parallel cluster configuration • lxplus: 70 redhat 61 nodes • 7 Intel STL2 (2 x PIII 1GHz, 512MB) • 31 ASUS P2B-D ( 2 x PIII 600MHz, 512MB) • 15 Celsius 620 (2 x PIII, 550MHz, 512MB) • the rest – Kayak 450 Mhz (2 x PIII, 450Mhz, 128MB) • reference sequential machine • pcgeant2 (2x Xeon 1700Mhz, 1GB)
Benchmarking Caveat • non-exclusive access to interactive machines • 'load-noise' background, unpredictible load peaks • different CPU and RAM on nodes • AFS used to fetch physics config data • try to remove the noise: • repeat simulations many times to get the correct mean • work at night and off-peak hours (what about US people using CERN computing facilities ?) • etc... • conclusion: • results should be taken with caution and are approximate
Structure of the simulation • initialization phase (constant) • load ~10-15 Mb of physics tables, config data etc. • reference sequential machine: ~ 4 minutes (user time) • cluster nodes: ~ 5-6 minutes • beamOn ~ f( event number ) • small job: 1-5 Mio events • medium job: 20-40 Mio events • big job: > 50 Mio events
Benchmarking (comments) • results are approximate • scaling factors for different CPU speeds • but seem with agreement with expectations • move from batch to semi interactive simulation feasible • small jobs do not gain so much – large constant initialization time
Problems & solutions • time of job execution = slowest machine... • ...or most loaded one at the moment • often had to wait a long time for last worker to finish • possible solution: • use larger number of smaller workers • fast machines run workers sequentially many times, but... • constant initialization time rather important • initialize once, beamOn many times... to be checked • if this problem is solved we may move towards more interactive simulation
Reproducability • initial seed of the random engine • make sure that every parallel simulation starts with a seed uniquely determined by the job's initial seed • number of times engine is used depends on the initial seed • make sure that correlations between the workers' seeds are avoided • our solution: • use two uncorrelated random engines • one to generate a table of initial seeds (one seed for each worker) • another for the simulation inside the worker
Reproducability • parameters which need to be fixed to reproduce the simulation: • total number of events • initial seed • ... but also: • number of workers • number of events per worker
Ease of use • user-friendliness • G4 simulation developer should not need to fight with irrelevant technical problems when moving from sequential to parallel G4 simulation • as non-intrusive as possible • minimize necessary code changes in original simulation • good separation of the subsystems • G4 simulation does not need to know that it runs in parallel... • the distributed framework (DIANE) does not need to care about what actually is being simulated (see #Slide 20)
What is DIANE? R&D project in IT/API semi-interactive parallel analysis for LHC middleware technology evaluation & choice CORBA, MPI, Condor, LSF... also see how to integrate API products with GRID prototyping (focus on ntuple analysis) time scale and resources: Jan 2001: start (< 1 FTE) June 2002: running prototype exists sample Ntuple analysis with Anaphe event-level parallel Geant4 simulation
What is DIANE? frameworkfor parallel cluster computation application-oriented master-worker model common in HEP applications application-independent apps dynamically loaded in a plugin style callbacks to applications via abstract interfaces component-based subsystems and services packaged into component libraries core architecture uses CORBA and CCM (CORBA Component Model ) integration layer between applications and the GRID environment and deployment tools
Master/Worker model applications share the same computation model so also share a big part of the framework code but have different non-functional requirements CPU vs IO intensive semi-interactive vs batch etc....
What DIANE is not DIANE is not a replacement for a GRID and its services a hardwired analysis toolkit
DIANE and GRID DIANE as a GRID computing element ...via a gateway that understands Grid/JDL ... Grid/JDL must be able to descibe parallel jobs/tasks DIANEas a user of (low level) Grid services ...authentication, security, load balancing... and profit from existing 3rd party implementations python environment is a rapid prototyping platform and may provide a convinient connection between DIANE and Globus Toolkit via pyGlobus API
Architecture Overview layering: abstract middleware interfaces and components plugin-style application loading
Conclusions • prototype deployment of G4-DIANE • significant performance improvement possible • scalability tests: • 140 Mio Events • 70 nodes in the cluster • 1 hour total parallel execution • putting together DIANE and G4 is fairly easy • done in several days... • DIANE may bridge G4 to the GRID world • without necessarily waiting for fully-fledged GRID infrastructure to become available