150 likes | 301 Views
Physics data management tools: computational evolutions and benchmarks. Mincheol Han 1 , Chan-Hyeung Kim 1 , Lorenzo Moneta 2 , Maria Grazia Pia 3 , Hee Seo 1 1 Hanyang University, Korea – 2 CERN, Switzerland – 3 INFN Sezione Di Genova, Italy . SNA + MC 2010
E N D
Physics data management tools: computational evolutions and benchmarks Mincheol Han1, Chan-Hyeung Kim1,Lorenzo Moneta2, Maria Grazia Pia3,Hee Seo1 1 Hanyang University, Korea – 2 CERN, Switzerland – 3 INFN Sezione Di Genova, Italy SNA + MC 2010 Joint International Conference on Supercomputing in Nuclear Applications + Monte Carlo 2010
Physics data libraries • Data libraries • Collection of experimental or theoretical tabulations of physics quantities • e.g. cross sections, form factors, nuclear and atomic parameters etc. • Distributed by data centres: RSICC (ORNL), NEA, NIST… • Essential ingredient of Monte Carlo simulation • Use established data • Speed up simulation w.r.t. using analytical formulae • Common background for different Monte Carlo systems • ENDF/B, ENSDF, JENDL, CENDL, BROND, EEDL, EPDL, EADL…
Dealing with physics data • Data management • Load (and store) data • Retrieve data • Use data: directly, by interpolation • Loading • Usually in the simulation initialization phase • Loading on demand • Retrieving • In the course of the simulation (usually at each step) • Can be source of significant overload
electromagnetic data (Livermore library) Original design in Geant4 Strategy Pattern Handle interchangeable interpolation algorithms transparently • Composite Pattern • Handle different data collections transparently • Data for materials • Data for atoms • Data for shells
Can we improve it? This talk: selection of preliminary results Final and complete results will be documented in a dedicated publication • Geant4 physics on a diet • Leaner software design • Improve computational performance • Enhance clarity and transparency • Facilitate testing • Ease maintenance • CHEP 2009 R&D: physics models • M.G. Pia et al., Design and performance evaluations of generic programming techniques in a R&D prototype of Geant4 physics • Monte Carlo + CHEP 2010 R&D: physics data • Prototypeto evaluate candidate solutions quantitatively
Test set-up • Test case: Livermore library data • EEDL (Evaluate Electron Data Library): ionisation, Bremsstrahlung • EPDL97 (Evaluated Photon Data Library): Compton and Rayleigh scattering, photoelectric effect, pair and triplet production • EADL (Evaluated Atomic Data Library): atomic parameters • Computing environment • Geant4 9.4-beta + G4EMLOW 6.13 • Intel® Core™ Duo CPU E8500 with 3.16 GHz processor, 4 GB RAM, Linux SLC5, gcc 4.3.5 compiler • Intel® CPU U4100 with 1.30GHz processor, 2 GB RAM, MS Windows XP SP3, MSVC++9 C++ compiler (with SP1) • Load test • loading data for a number of elements between 1 and 100 • each experiment repeated 100 times, the whole series repeated 10 times • Retrieve test • finding the data associated with a randomly chosen atomic number • finding procedure repeated 106 times, whole experiment repeated 10 times
Data structure Excitation data • Improve the physical design of the data library itself • Large tabulations split into individual files (one per element) original data split data time (ms) to load data vs. number of elements present in the experimental set-up
Compton scattering functions Data structure original C C • Large physics tabulations require large memory allocation for storing the data, time to load them into memory and to search trough them • Are all the data really necessary? • Reduce the amount of data reduced B B Number of data for each element original data reduced data A A Suppress B if ●can be interpolated with the same accuracy based on A and C time (ms) to load data vs. number of elements present in the experimental set-up
Use forthcoming C++ features Pair production cross sections • Current implementation uses STL map for most data, STL vector for a few data types • Evaluated unordered_map(AKA hash map) • Included in C++0x TR1 • <tr1/unordered_map>gcc 4.3.x • <unordered_map>in MSVC STL map unordered_map time (ms) to load data vs. number of elements present in the experimental set-up
Caching pre-calculated data • Recent modification in Geant4 low energy electromagnetic package: cache pre-calculated log10 data • Credit to current Geant4 low energy electromagnetic group • Not to be credited to the authors of this talk • The authors of this talk • Quantified the time for loading/retrieving • Quantified the memory consumption to store additional (cached) data • Reviewed the modified software design and implementation: flaws ~10% time saving w.r.t. on-the-fly log10 calculation loading original modified original modified time (ms) to load and retrieve data vs. number of elements present in the experimental set-up retrieving
Generic programming techniques OOAD iteration polymorphic behavior of data sets and interpolation algorithms is not necessary at runtime through dynamic binding Preliminary design Templates eliminate the overhead due to the virtual table associated with inheritance Contribution to to improving execution speed
Effect of prototype design: loading The extent of the improvement depends on the characteristics of the data Rayleigh scattering form factors Bremsstrahlung cross sections Original design: STL vectors, load all elements original prototype original prototype time (ms) to load data vs. number of elements present in the experimental set-up
Effect of prototype design: retrieving Original design Prototype design Prototype design + unordered_map Bremsstrahlung spectrum data Pair production cross sections time (ms) to retrieve data vs. number of elements present in the experimental set-up
Use vectors! Rayleigh scattering form factors • Some data sets in the original design do not require the use of STL map • Can be efficiently managed by using STL vectors • Not worthwhile to move them to unordered_map time (ms) to retrieve data vs. number of elements present in the experimental set-up Original design Prototype design (map) Prototype design (unordered_map)
Acknowledgment Thanks to CERN Directorate for support Conclusions • Prototype R&D on Geant4 physics data management • Investigated • Data structure • Software design • Use of C++0x TR1 features • Results • Leaner software • Improved performance Same conclusions at CHEP 2009 regarding physics modeling Geant4 R&D phase Cutting edge technology Rigorous software development process RD44 1994-1998 Geant4 would profit from reenacting a R&D phase to exploit new technology with the same spirit of scientific openness and rigorousness as RD44