150 likes | 162 Views
This presentation provides insights into the CPU performance optimization of Geant4, highlighting benchmarks, code review, and future plans for enhancing computing efficiency. It covers techniques, recommendations, and plans for improving application performance. The content focuses on different benchmark categories, automation, policy definition, and existing benchmark structures. Additionally, it discusses CPU benchmarks, electromagentic and hadronic physics optimizations, and performance changes in various Geant4 versions.
E N D
Geant4 Technical Forum, CERN, 07 November 2007 Geant4 CPU performance : an update J.Apostolakis, G.Cooperman, G.Cosmo, V.Ivanchenko, I.Mclaren, T.Nikitina, A.Ribon CERN PH/SFT
From Geant4 Workshop in Hebden Bridge, 12-19 September 2007(1) • Plenary Section ‘CPU Performance’ • Parallel Section ‘CPU Benchmarks’ • Plenary Discussion Section ‘Computing Performance’ • Different talks in other Sections
From Geant4 Workshop in Hebden Bridge, 12-19 September 2007(2) • CPU performance optimization of Geant4 has been and • is an important consideration • To monitor CPU performance a set of Benchmarks is used • Geant4 Benchmarks are simple setups • each targeted to stress one particular area (transportation, tracking in • magnetic field; electromagnetic physics; hadronic physics) • Benchmarks are run manually by developers • It is a part of the validation test runs at every release • Future plans : run by STT regularly • Profiling and code review very helpful for improvements in • CPU performance • Speedup in 4.9.0 version especially due to CODE REVIEW in • Electromagnetic physics
From Geant4 Workshop in Hebden Bridge, 12-19 September 2007(3) • Depending on the application • different techniques can be used to improve CPU performance • Examples : • Biasing • Neutron_HP • Geometry Description • Tuning • Regular Navigation in phantoms • More details in Alex’s Talk • “Recommendations for improving application performance” • http://indico.cern.ch/materialDisplay.py?contribId=106&sessionId=5&materialId=slides&confId=10311
From Geant4 Workshop in Hebden Bridge, 12-19 September 2007(4) • CPU Benchmarks : Plans • Add more complex benchmarks, better covering different aspects of ‘toolkit’ • Possible Candidates : geometry imported via GDML • navigation on phantoms/regular geometries • low energy benchmarks • specific benchmarks for hadronic physics models • Automation of running the benchmarks • Defining Policy for Benchmarks (platform, system, • structure, frequency of run ) • Adopt dedicated system with total control on it in order to reduce • possible time fluctuations (no afs, controlled system upgrade ) • Start immediately after this workshop
Current Status • Existing Benchmarks are introduced in separate CVS module. • Priority for : easy way to run benchmarks, • easy way for accessing timing information • Structure according to categories: • calorimeter : FullCMS with geometry imported via GDML • HadCalCMS • electromagnetic: be1 (from TestEm9, crystal calorimeter CMS-type) • be2 (from TestEm3, sampling calorimeter ATLAS-type) • transportation: honeycomb (pure tracking) • NTST (tracking in magnetic field, based on BaBar Tracker • with drift chamber) • Each test contents : • Source code, macros and documentation (README, History) • Selection of system/platforms • SLC4, 64 bits architecture • Static optimized libraries
Tracking in Magnetic Field: only transportation process. BaBar Tracker It consists of simulating the BaBar silicon tracker and 40 layers drift chamber, in a 1.5 T constant magnetic field. Only transportation, no physics. 100 B-Bbar events simulated. Locally build with static libraries . With afs version big time variations were measured (5% or more ) Releasesec/event Ratios 7.1.p01a 2. 05 1.00 8.0.p01 2.04 1.01 8.1.p02 2.14 1.04 <---G4FieldTrack::LoadFromArray not inline 8.2 2.31 1.12 <---G4Navigator::LocateGlobalPointAndSetup 8.2.p01 2.31 1.12become virtual 8.3 2.3 1.12 8.3.p01 2.31 1.12 9.0 2.26 1.10<--- G4PropagatorInField 9.0.p01 2.26 1.10(better initialization of G4FieldTrack array) The number of steps and calls to fields are almost the same in all cases.
Electromagnetic physics: CPU benchmark SLC4 • Static build on dedicated SLC4 PC, no libraries from afs • SLC3 to SLC4 migration slightly change ratio between CPU of different tests QGSP QGSP_EMV Better CPU performance in 9.0 mainly due to code review of Electromagnetic physics module
Hadronic physics. Large statistics(1) π- 50 GeV on Copper-Scintillator calorimeter (25 layers, Cu (6cm) - Sci (4mm): a simplified version of CMS HCAL); default 0.7 mm production cut, QGSP_EMV, 4000 events Local installation with static libraries on dedicated computer (SLC4) Release B=0 B=4TRatios #steps/evt 7.1.p01a 1.83 2.07 1.00 1.00 99,050 99,190 8.0.p01 2.00 2.20 1.09 1.06 105,290 105,280 8.1.p02 2.12 2.41 1.16 1.16 105,000 105,620 8.2.p01 2.25 2.56 1.23 1.24 107,290 107,500 8.3 2.22 2.50 1.21 1.21 107,000 106,550 Release B=0 Ratios #steps/evt 7.1.p01a 2.994 1.00 172,240 8.0.p01 3.114 1.04 181,380 8.1.p02 3.160 1.06 175,500 8.2.p01 3.042 1.02 175,690 8.3 3.075 1.03 174,680 sec/evt e- 50 GeV
Main physics changes affecting CPU Electromagnetic physics New model of Multiple Scattering (not in QGSP_EMV) • Hadronic physics CHIPS capture at rest for negatively charged hadrons (G4QStoppingPhysics since 8.1) • Due to these improvements in physics more steps and tracksper eventare produced Which slow down the CPU performance
Pure tracking benchmark Honeycomb calorimeter benchmark It consists of transporting 10,000 geantinos, along predefined directions, in a honeycomb calorimeter made of two modules, each 26 x 50 tubes Releasetotal time Ratios 5.2.p02 2.57s 0.84 6.2.p02 3.05s 1.00 <---G4Navigator becomes base class 7.0.p01 3.00s 0.98 7.1.p01a 3.06s 1.00 8.0.p01 3.07s 1.00 8.1.p02 3.02s 0.99 8.2.p01 3.14s 1.03 <--- in G4Navigator8.3 3.15s 1.03LocateGlobalPointAndSetup() metod 8.3.p01 3.13s 1.02 becomesvirtual 9.0 3.15s 1.03 9.0.p01 3.14s 1.03 These changes in G4Navigator have been done to accommodate the Tgeo/VMC interface (ALICE requirement)
Tracking in Magnetic Field: QGSP_EMV Physics List BaBar Tracker Same Geant4 example as in the previous slide, but this time with the QGSP_EMV Physics List. 100 B-Bbar events simulated. Local build with static libraries. Releasesec/event Ratios 7.1.p01a 3.04 1.00 (QGSP_GN) 8.0.p01 3.78 1.24 8.1.p02 3.85 1.27 8.2 3.72 1.22 8.2.p01 3.84 1.26 8.3 3.91 1.29 8.3.p01 3.89 1.28 9.0 3.57 1.17 <---Code review of Electromagnetic 9.0.p01 3.62 1.19 physics module * The variations are due to tuning and adding safety checks to Urban Multiple Scattering model. *
Electromagnetic physics EM-1 : 10 GeV e- in matrix 5x5 of PbWO4 crystals (CMS-type); cut = 0.7 mm, 1000 events. EM-2 : 10 GeV e- in ATLAS barrel type sampling calorimeter; cut = 0.7 mm, 1000 events. EM-3 : 10 GeV e- in ATLAS barrel type sampling calorimeter; cut = 0.02 mm, 100 events. QGSP QGSP_EMV ReleaseEM-1 EM-2 EM-3 EM-1 EM-2 EM-3 5.2.p02 1.03 0.99 1.59 6.2.p02 0.89 0.98 0.97 7.1.p01 1.00 1.00 1.00 8.0.p01 1.33 2.24 2.26 8.1.p01 1.37 2.43 2.01 1.06 1.08 1.07 8.2.p01 1.27 2.03 1.73 1.03 1.09 1.06 QGSP in 8.x is slower than 7.1 by 20-140% QGSP_EMV in 8.x is slower than 7.1 by 3-9% All numbers are with CERN afs installation for SLC3 and shared libraries
Hadronic physics. Large statistics(2) p- 50 GeV on Copper-Scintillator calorimeter (25 layers, Cu (6cm) - Sci (4mm): a simplified version of CMS HCAL); default 0.7 mm production cut, QGSP_EMV, 4000 events Run in the same conditions as on previous slide but few months later Release B=0 B=4T Ratios #steps/evt 8.3.p01 2.31 2.62 1.00 1.00 105,440 106,290 9.0. 2.14 2.45 0.93 0.94 106,670 106,240 9.1.p01 2.19 2.50 0.95 0.95 106,300 105,620 sec/evt e- 50 GeV sec/evt Release B=0 Ratios #steps/evt 8.3.p01 3.210 1.00 174,640 9.0. 2.959 0.92 174,270 9.1.p01 3.029 0.94 174,290 8.3 3.175 0.99 174,680 8.3(05.2007) 3.075 0.96 174,680