1 / 29

Statistical Toolkit Power of Goodness-of-Fit tests

Fluorescence spectrum of Icelandic Basalt 8.3 keV beam. Counts. Energy (keV). Statistical Toolkit Power of Goodness-of-Fit tests. B. Mascialino 1 , A. Pfeiffer 2 , M.G. Pia 1 , A. Ribon 2 , P. Viarengo 3 1 INFN Genova, Italy 2 CERN, Geneva, Switzerland

parmstrong
Download Presentation

Statistical Toolkit Power of Goodness-of-Fit tests

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fluorescence spectrum of Icelandic Basalt 8.3 keV beam Counts Energy (keV) Statistical ToolkitPower of Goodness-of-Fit tests B. Mascialino1, A. Pfeiffer2, M.G. Pia1, A. Ribon2, P. Viarengo3 1INFN Genova, Italy 2CERN, Geneva, Switzerland 3IST – National Institute for Cancer Research, Genova, Italy CHEP 2006 Mumbai, 13-17 February 2006

  2. Historical background… Some use cases Validation of Geant4 physics models through comparison of simulation vs. experimental data or reference databases • Regression testing • Throughout the software life-cycle • Online DAQ • Monitoring detector behaviour w.r.t. a reference • Simulation validation • Comparison with experimental data • Reconstruction • Comparison of reconstructed vs. expected distributions • Physics analysis • Comparisons of experimental distributions (ATLAS vs. CMS Higgs?) • Comparison with theoretical distributions (data vs. Standard Model) The test statistics computation concerns the agreement between the two samples’ empirical distribution functions

  3. Releases are publicly downloadable from the web • code, documentation etc. • Releases are also distributed with LCG Mathematical Libraries • Also ported to Java, distributed with JAS G.A.P Cirrone, S. Donadio, S. Guatelli, A. Mantero, B. Mascialino, S. Parlati, M.G. Pia, A. Pfeiffer, A. Ribon, P. Viarengo “A Goodness-of-Fit Statistical Toolkit” IEEE- Transactions on Nuclear Science (2004), 51 (5): 2056-2063.

  4. Vision of the project • Basicvision • General purpose tool • Toolkit approach (choice open to users) • Open source product • Independent from specific analysis tools • Easily usable in analysis and other tools • Rigorous software process Clearly define scope, objectives Software quality Flexible, extensible, maintainable system Build on a solid architecture

  5. GoF algorithms (latest public release) • Algorithms for binned distributions • Anderson-Darling test • Chi-squared test • Fisz-Cramer-von Mises test • Algorithms for unbinned distributions • Anderson-Darling test • Cramer-von Mises test • Goodman test(Kolmogorov-Smirnov test in chi-squared approximation) • Kolmogorov-Smirnov test • Kuiper test

  6. Recent extensions: algorithms Improved tests • Fisz-Cramer-von Mises test and Anderson-Darling test • exact asymptotic distribution (earlier: critical values) • Tiku test • Cramer-von Mises test in a chi-squared approximation • It is the most complete software for the comparison of two distributions, even among commercial/professional statistics tools • goal: provide all 2-sample GoF algorithms existing in statistics literature • Publication in preparation to describe the new algorithms • Software release: March 2006 Weighted Kolmogorov-Smirnov, weighted Cramer-von Mises various weighting functions available in literature Watson test can be applied in case of cyclic observations, like the Kuiper test New tests In preparation Girone test

  7. User Layer • Simple user layer • Shields the user from the complexity of the underlying algorithms and design • Only deal with the user’s analysis objects and choice of comparison algorithm • First release: user layer for AIDA analysis objects • LCG Architecture Blueprint, Geant4 requirement • July 2005: added user layer for ROOT histograms • in response to user requirements • Other user layer implementations foreseen • easy to add • sound architecture decouples the mathematical component and the user’s representation of analysis objects

  8. Power of GoF tests • Do we really need such a wide collection of GoF tests? Why? • Which is the most appropriate test to compare two distributions? • How “good” is a test at recognizing real equivalent distributions and rejecting fake ones? Which test to use? • No comprehensive study of the relative power of GoF tests exists in literature • novel research in statistics (not only in physics data analysis!) • Systematic study of allexisting GoF tests in progress • made possible by the extensive collection of tests in the Statistical Toolkit

  9. Parent distribution 1 Parent distribution 2 GoF test Sample 1 n Sample 2 m Confidence Level = 0.05 # pseudoexperiments with p-value < (1-CL) Power = # pseudoexperiments Method for the evaluation of power Pseudoexperiment: a random drawing of two samples from two parent distributions N=1000 Monte Carlo replicas For each test, the p-value computed by the GoF Toolkit derives from the analytical calculation of the asymptotic distribution, often depending on the samples sizes

  10. Gaussian Uniform Exponential Double exponential Cauchy Contaminated Normal Distribution 1 Contaminated Normal Distribution 2 Parent distributions Also Breit-Wigner, other distributions being considered

  11. Characterization of distributions Skewness Tailweight

  12. General alternativeCompare different distributions Unbinned distributions Parent1 ≠ Parent2

  13. Medium tailed vs Medium tailed Symmetric vs Simmetric Short tailed vs Medium tailed Empirical power (%) FLAT vs EXPONENTIAL DOUBLE EXPONENTIAL vs EXPONENTIAL Empirical power (%) Symmetric vs skewed Sample size CvM W KS AD K W Sample size AD No clear winner K CN1 vs CN2 DOUBLE EXPONENTIAL vs CN1 Symmetric vs Symmetric Empirical power (%) Empirical power (%) Symmetric vs Skewed Medium tailed vs Medium tailed Medium tailed vs Medium tailed Sample size Sample size The power increases as a function of the sample size

  14. Symmetric vs Symmetric Medium tailed vs Medium tailed Symmetric vs Symmetric Medium tailed vs Long tailed Simmetric vs Simmetric GAUSSIAN vs DOUBLE EXPONENTIAL GAUSSIAN vs DOUBLE EXPONENTIAL GAUSSIAN vs CN1 Medium tailed vs Long tailed Empirical power (%) Very similar distributions Empirical power (%) Empirical power (%) CvM W KS KS CvM CvM AD K Sample size Long tailed vs Medium tailed Sample size Samples size Symmetric vs Asymmetric GAUSSIAN vs CN2 Empirical power (%) CAUCHY vs EXPONENTIAL Symmetric vs skewed Medium tailed vs Medium tailed Empirical power (%) Sample size AD CvM AD Sample size The power increases as a function of the sample size

  15. Samples size = 5 Sample size = 5 Sample size = 15 Samples size = 15 EXPONENTIAL vs OTHER DISTRIBUTIONS EXPONENTIAL vs OTHER DISTRIBUTIONS Empirical power (%) Empirical power (%) Empirical power (%) Empirical power (%) CvM W KS AD EXPONENTIAL vs OTHER DISTRIBUTIONS EXPONENTIAL vs OTHER DISTRIBUTIONS K Tailweight 2ND distribution Tailweight 2ND distribution Tailweight 2ND distribution Tailweight 2ND distribution Empirical power (%) Empirical power (%) Empirical power (%) Empirical power (%) Tailweight 2ND distribution Tailweight 2ND distribution Tailweight 2ND distribution Tailweight 2ND distribution The power varies as a function of the parent distributions’ characteristics Distribution1 asymmetric Samples size = 15 Samples size = 5 FLAT vs OTHER DISTRIBUTIONS Distribution1 symmetric FLAT vs OTHER DISTRIBUTIONS FLAT vs OTHER DISTRIBUTIONS FLAT vs OTHER DISTRIBUTIONS

  16. Sample size = 5 Sample size = 15 EXPONENTIAL vs OTHER DISTRIBUTIONS Empirical power (%) Empirical power (%) CvM W KS AD EXPONENTIAL vs OTHER DISTRIBUTIONS K Tailweight 2ND distribution Tailweight 2ND distribution Empirical power (%) Empirical power (%) Tailweight 2ND distribution Tailweight 2ND distribution The power varies as a function of parent distributions’ characteristics Distribution1 asymmetric Sample size = 15 Sample size = 5 FLAT vs OTHER DISTRIBUTIONS Distribution1 symmetric FLAT vs OTHER DISTRIBUTIONS

  17. Comparative evaluation of tests Preliminary Tailweight Skewness

  18. Location-scale alternativeSame distribution, shifted or scaled Parent1(x) = Parent2 ((x-θ)/τ)

  19. θ =0.5, τ = 0.5 θ =0.5, τ = 1.5 EXPONENTIAL EXPONENTIAL Empirical power (%) Empirical power (%) CvM W KS Sample size Sample size AD K θ =1.0, τ = 0.5 θ =1.0, τ = 1.5 EXPONENTIAL EXPONENTIAL Empirical power (%) Empirical power (%) Sample size Sample size Power increases as a function of sample size CvM KS K W CvM No clear winner KS

  20. CvM W KS AD K Power increases as a function of sample size θ =0.5, τ = 0.5 θ =0.5, τ = 1.5 CN2 CN2 W K Empirical power (%) Empirical power (%) Sample size Sample size θ =1.0, τ = 1.5 θ =1.0, τ = 0.5 CN2 CN2 No clear winner Empirical power (%) Empirical power (%) Sample size Sample size

  21. θ =0.5, τ = 0.5 θ =0.5, τ = 1.5 N=10 N=10 Empirical power (%) Empirical power (%) Tailweight CvM W KS Tailweight AD K θ =1.0, τ = 1.5 θ =1.0, τ = 0.5 N=10 N=10 Empirical power (%) Empirical power (%) Tailweight Tailweight Power decreases as a function of tailweight No clear winner No clear winner

  22. Binned distributions General alternativeCompare different distributions Parent1 ≠ Parent2

  23. Chi-squared test: POWER % Samples size = 500 Number of bins = 20

  24. Preliminary results • No clear winner for all the considered distributions in general • the performance of a test depends on its intrinsic features as well as on the features of the distributions to be compared • Practical recommendations • first classify the type of the distributions in terms of skewness and tailweight • choose the most appropriate test given the type of distributions • Systematic study of the power in progress • for both binned and unbinned distributions • Topic still subject to research activity in the domain of statistics • Publication in preparation

  25. CvM CvM CvM W W W KS KS KS AD AD AD K K K Surprise… Inefficiency Flat Inefficiency Gaussian Inefficiency Exponential General alternative, same distributions CL = 95%, expect 5% inefficiency Anderson-Darling exhibits an unexpected inefficiency at low numerosity Not documented anywhere in literature! Limitation of applicability?

  26. Outlook • 1-sample GoF tests (comparison w.r.t. a function) • Comparison of two/multi-dimensional distributions • Systematic study of the power of GoF tests • Goal to provide an extensive set of algorithms so far published in statistics literature, with a critical evaluation of their relative strengths and applicability • Treatment of errors, filtering • New release coming soon • New papers in preparation • Other components beyond GoF? Suggestions are welcome…

  27. Conclusions • A novel, complete software toolkit for statistical analysis is being developed • rich set of algorithms • sound architectural design • rigorous software process • A systematic study of the power of GoF tests is in progress • unexplored area of research • Application in various domains • Geant4, HEP, space science, medicine… • Feedback and suggestions are very much appreciated • The project is open to developers interested in statistical methods

  28. IEEE Transactions on Nuclear Sciencehttp://ieeexplore.ieee.org/xpl/RecentIssue.jsp?puNumber=23 • Prime journal on technology in particle/nuclear physics • Review process reorganized about one year ago • Associate Editor dedicated to computing papers • Various papers associated to CHEP 2004 published on IEEE TNS Papers associated to CHEP 2006 are welcome Manuscript submission:http://tns-ieee.manuscriptcentral.com/ Papers submitted for publication will be subject to the regular review process Publications on refereed journals are beneficial not only to authors, but to the whole community of computing-oriented physicists Our “hardware colleagues” have better established publication habits… Further info: Maria.Grazia.Pia@cern.ch

More Related