210 likes | 343 Views
B. Mascialino, A. Pfeiffer, M. G. Pia, A. Ribon, P. Viarengo. A Toolkit for statistical comparison of data distributions. Monte Carlo 2005 - Chattanooga, April 2005. Data analysis. Provide tools for the statistical comparison of distributions equivalent reference distributions
E N D
B. Mascialino, A. Pfeiffer, M. G. Pia, A. Ribon, P. Viarengo A Toolkit for statistical comparison of data distributions Monte Carlo 2005 - Chattanooga, April 2005
Data analysis Provide tools for the statistical comparison of distributions • equivalent reference distributions • experimental measurements • data from reference sources • functions deriving from theoretical calculations or fits Detector monitoring Simulation validation Reconstruction vs. expectation Regression testing Physics analysis
GoF statistical toolkit Comparison of distributions Qualitative evaluation Quantitative evaluation Goodness of fit testing A project to develop a statistical comparison system
SPIRAL APPROACH Software process guidelines • United Software Development Process, specifically tailored to the project • practical guidance and tools from theRUP • both rigorous and lightweight • mapping onto ISO 15504 • Guidance from ISO 15504 • Incremental and iterative life cycle model
Architectural guidelines • The project adopts a solid architectural approach • to offer thefunctionalityand thequalityneeded by the users • to bemaintainableover a large time scale • to beextensible, to accommodate future evolutions of the requirements • Component-based approach • to facilitate re-use and integration in different frameworks • AIDA • adopt a (HEP) standard • no dependence on any specific analysis tool
The tests are specialised on the kind of distribution (binned/unbinned)
G.A.P Cirrone, S. Donadio, S. Guatelli, A. Mantero, B. Mascialino, S. Parlati, M.G. Pia, A. Pfeiffer, A. Ribon, P. Viarengo “A Goodness-of-Fit Statistical Toolkit” IEEE- Transactions on Nuclear Science (2004), 51 (5): 2056-2063. Release StatisticsTesting-V1-01-00 downloadable from the web: http://www.ge.infn.it/geant4/analysis/HEPstatistics/
Chi-squared test • Applies tobinneddistributions • It can be useful also in case of unbinned distributions, but the data must be grouped into classes • Cannot be applied if the counting of the theoretical frequencies in each class is < 5 • When this is not the case, one could try to unify contiguous classes until the minimum theoretical frequency is reached • Otherwise one could use Yates’ correction
Tests based on maximum distance • Kolmogorov-Smirnov test • Goodman approximation of KS test • Kuiper test EMPIRICAL DISTRIBUTION FUNCTION ORIGINAL DISTRIBUTIONS unbinned distributions Dmn SUPREMUM STATISTICS
Tests containing a weighting function • Fisz-Cramer-von Mises test • Approx Anderson-Darling test EMPIRICAL DISTRIBUTION FUNCTION ORIGINAL DISTRIBUTIONS QUADRATIC STATISTICS + WEIGHTING FUNCTION Sum/integral of all the distances binned/unbinned distributions
User’s point of view The user is completely shielded from both statistical and computingcomplexity. STATISTICAL RESULT TOOLKIT USER EXTRACTS THE ALGORITHM WRITING ONE LINE OF CODE • Simple user layer • Only deal withAIDA objectsand choice ofcomparison algorithm
Software testing Unit tests Rigorous software process adopted Test process Integration tests System tests • Testing focuses primarily on the evaluation or assessment of quality of the software product, guaranteeing its correctness and robustness. • finding and documenting defects in software quality • validating software product functions as designed • validating that the requirements have been implemented appropriately • Test result summaries are included as part of the documentation of the Toolkit release and are available on the web.
Work in progress: new tests supremum statistics • Weighted KS tests • Weighted CVM tests • CVM approximation to a 2 (Tiku test) • Exact Anderson-Darling test • Watson test • Watson approximation to a 2 (Tiku test) • With these tests the GoF Statistical Toolkit will be the most complete toolkit for the two-sample problem in physics as well as in the statistics • domain. unbinned distributions quadratic statistics binned/unbinned distributions unbinned distributions
Is2 the most powerful algorithm? Supremum statistics tests Tests containing a weight function 2 < < The power of a test is the probability of rejecting the null hypothesis correctly In terms of power: • 2loses information in a test for unbinned distribution by grouping the data into cells Kac, KieferandWolfowitz(1955) showed thatKolmogorov-Smirnovtestrequiresn4/5observations compared to n observations for2to attain the same power • Cramer-von MisesandAnderson-Darlingstatistics are expected to be superior toKolmogorov-Smirnov’s, since they make a comparison of the two distributions all along the range of x, rather than looking for a marked difference at one point
Microscopic validation ofphysics • Physics models under test: • Geant4 Standard • Geant4 Low Energy – Livermore • Geant4 Low Energy – Penelope • Reference data: • NIST ESTAR - ICRU 37 p-value H0 REJECTION AREA K. Amako, S. Guatelli, V. Ivanchenko, M. Maire, B. Mascialino, K. Murakami, P. Nieminen, L. Pandola, S. Parlati, A. Pfeiffer, M. G. Pia, M. Piergentili, T. Sasaki, L. Urban Precision validation of Geant4 electromagnetic physics p-value stability study Geant4 LowE Penelope Geant4 Standard Geant4 LowE EEDL NIST - XCOM Geant4 LowE Penelope Geant4 Standard Geant4 LowE EEDL The three Geant4 models are equivalent Z
vacuum air Energy deposit in the phantom by GCR p GCR Al structure/ inflatable structure + shielding 2.15 cm Al Inflatable habitat + 5 cm water phantom Inflatable habitat + 10 cm water 4 cm Al Radioprotection applications in manned space missions inflatable habitat • Comparison of inflatable and conventional rigid habitat concepts: Effect ofdifferent shielding materials Effect ofshielding thickness E.m.andhadronic interactions thanks to Susanna Guatelli KS TEST S. Guatelli, B. Mascialino, P. Nieminen, M. G. Pia Radioprotection for interplanetary manned missions S. Guatelli, B. Mascialino, P. Nieminen, M. G. Pia Radioprotection for interplanetary manned missions
Test beam at Bessy Bepi-Colombo mission X-ray fluorescence spectrum in Iceand basalt (EIN=6.5 keV) Very complex distributions Counts Energy (keV) c2 not appropriate (< 5 entries in some bins, physical information would be lost if rebinned) Anderson-Darling Ac (95%) =0.752 A. Mantero, B. Mascialino, P. Nieminen, M. G. Pia, A. Owens, M. Bavdaz, A. Peacock A library for simulated X-ray emission from planetary surfaces thanks to Alfonso Mantero
Medical applications-IMRT % dose % dose Kolmogorov-Smirnov test Distance (mm) Distance (mm) thanks to Michela Piergentili F. Foppiano, B. Mascialino, M. G. Pia, M. Piergentili Geant4 simulation of an accelerator head for intensity modulated radiotherapy
Conclusions • This is a newup-to-dateeasy to handle and powerfultool for statistical comparison in particle physics. • It the first tool supplying such a variety of sophisticated and powerful statistical tests in HEP. • Released and downloadable from the web. • AIDA interfaces allow its integration in any other data analysis tool. Applications in: HEP, astrophysics, medical physics, …