Distributed Simulation with Geant4: Preliminary Results of LowE/DIANE Joint Project

Distributed Simulation with Geant4 Preliminary results of the LowE / DIANE joint projectJakub T. Moœcicki, CERN/IT credits also to: Alfonso Mantero, INFN Genova

History • Parallelization of Geant4 simulation is a joint project between Geant4 – DIANE – Anaphe • DIANE is an R&D project in IT/API to study distributed analysis and simulation and create a prototype • initiated early 2001 with very limited resources • Anaphe is an analysis project supported by IT • provides the analysis framework for HEP • The pilot programme includes G4 simulation which produces AIDA/Anaphe histograms • Collaboration started late spring 2002

Sequential Geant4 Simulation • the goal of simulation: • optimize the detectors used for x-ray fluorescence emission from Mercury's crust in the context of Hermes, Bepi Colombo ESA mission. • requires high statistics è many events • 20 Mio events ~ 3 hours • up to 100 Mio events might be useful • estimated time ~16 hours

Parallel Geant4 Simulation • increase performance • shift from batch to semi-interactive simulation • speed up the analysis cycle • generate more events – debug simulation faster • from sequential to parallel simulation • preserve reproducability of the results • minimize deployment overhead • when moving from sequential to parallel simulation • both in terms of time and amout of code/expertise one must invest

Performance Increase

Benchmarking environment • parallel cluster configuration • lxplus: 70 redhat 61 nodes • 7 Intel STL2 (2 x PIII 1GHz, 512MB) • 31 ASUS P2B-D ( 2 x PIII 600MHz, 512MB) • 15 Celsius 620 (2 x PIII, 550MHz, 512MB) • the rest – Kayak 450 Mhz (2 x PIII, 450Mhz, 128MB) • reference sequential machine • pcgeant2 (2x Xeon 1700Mhz, 1GB)

Benchmarking Caveat • non-exclusive access to interactive machines • 'load-noise' background, unpredictible load peaks • different CPU and RAM on nodes • AFS used to fetch physics config data • try to remove the noise: • repeat simulations many times to get the correct mean • work at night and off-peak hours (what about US people using CERN computing facilities ?) • etc... • conclusion: • results should be taken with caution and are approximate

Structure of the simulation • initialization phase (constant) • load ~10-15 Mb of physics tables, config data etc. • reference sequential machine: ~ 4 minutes (user time) • cluster nodes: ~ 5-6 minutes • beamOn ~ f( event number ) • small job: 1-5 Mio events • medium job: 20-40 Mio events • big job: > 50 Mio events

Scalability test (job time)

Normalized efficency

Benchmarking (comments) • results are approximate • scaling factors for different CPU speeds • but seem with agreement with expectations • move from batch to semi interactive simulation feasible • small jobs do not gain so much – large constant initialization time

Problems & solutions • time of job execution = slowest machine... • ...or most loaded one at the moment • often had to wait a long time for last worker to finish • possible solution: • use larger number of smaller workers • fast machines run workers sequentially many times, but... • constant initialization time rather important • initialize once, beamOn many times... to be checked • if this problem is solved we may move towards more interactive simulation

From sequential to parallel simulation

Reproducability • initial seed of the random engine • make sure that every parallel simulation starts with a seed uniquely determined by the job's initial seed • number of times engine is used depends on the initial seed • make sure that correlations between the workers' seeds are avoided • our solution: • use two uncorrelated random engines • one to generate a table of initial seeds (one seed for each worker) • another for the simulation inside the worker

Reproducability • parameters which need to be fixed to reproduce the simulation: • total number of events • initial seed • ... but also: • number of workers • number of events per worker

Minimizing deployment overhead

Ease of use • user-friendliness • G4 simulation developer should not need to fight with irrelevant technical problems when moving from sequential to parallel G4 simulation • as non-intrusive as possible • minimize necessary code changes in original simulation • good separation of the subsystems • G4 simulation does not need to know that it runs in parallel... • the distributed framework (DIANE) does not need to care about what actually is being simulated (see #Slide 20)

What is DIANE? R&D project in IT/API semi-interactive parallel analysis for LHC middleware technology evaluation & choice CORBA, MPI, Condor, LSF... also see how to integrate API products with GRID prototyping (focus on ntuple analysis) time scale and resources: Jan 2001: start (< 1 FTE) June 2002: running prototype exists sample Ntuple analysis with Anaphe event-level parallel Geant4 simulation

What is DIANE? frameworkfor parallel cluster computation application-oriented master-worker model common in HEP applications application-independent apps dynamically loaded in a plugin style callbacks to applications via abstract interfaces component-based subsystems and services packaged into component libraries core architecture uses CORBA and CCM (CORBA Component Model ) integration layer between applications and the GRID environment and deployment tools

Master/Worker model applications share the same computation model so also share a big part of the framework code but have different non-functional requirements CPU vs IO intensive semi-interactive vs batch etc....

What DIANE is not DIANE is not a replacement for a GRID and its services a hardwired analysis toolkit

DIANE and GRID DIANE as a GRID computing element ...via a gateway that understands Grid/JDL ... Grid/JDL must be able to descibe parallel jobs/tasks DIANEas a user of (low level) Grid services ...authentication, security, load balancing... and profit from existing 3rd party implementations python environment is a rapid prototyping platform and may provide a convinient connection between DIANE and Globus Toolkit via pyGlobus API

Architecture Overview layering: abstract middleware interfaces and components plugin-style application loading

Conclusions • prototype deployment of G4-DIANE • significant performance improvement possible • scalability tests: • 140 Mio Events • 70 nodes in the cluster • 1 hour total parallel execution • putting together DIANE and G4 is fairly easy • done in several days... • DIANE may bridge G4 to the GRID world • without necessarily waiting for fully-fledged GRID infrastructure to become available

Distributed Simulation with Geant4: Preliminary Results of LowE/DIANE Joint Project

Distributed Simulation with Geant4: Preliminary Results of LowE/DIANE Joint Project

Presentation Transcript

Geant4 for Brachytherapy Simulation

M odification of Geant4 simulation

Geant4 Milagro Simulation

Introduction of Simulation by GEANT4

GLAST Geant4 Simulation

Road to complete Geant4 simulation

Straw Detector Simulation with GEANT4

New models for PIXE simulation with Geant4

Geant4 - a simulation toolkit -

Distributed Simulation with Geant4

HCal Simulation with Geant4

Detector Simulation and Geant4

REMSIM Geant4 Simulation

Distributed Simulation

Geant4 simulation for converter efficiency

HCal Simulation with Geant4

Geant4 Simulation for KM3

REMSIM Geant4 Simulation

Geant4 Simulation