The Statistical Testing Project

The Statistical Testing Project Stefania Donadio and Barbara Mascialino January 15TH, 2003

Aim of the project This project will provide a new way of analysing physical distributions of real data. It was thought as a tool for the statistical testing of Geant4: its application areas are physics validation, regression testing and system testing. Anyhow, its generality may be of interest also in other experimental contexts. At the moment, the core statistical component is designed to be applicable to the problem of comparing two distributions, independently from their origin.

Distributions • By means of this statistical tool, the user shall be able to compare G4 • simulations resultswith: • equivalent reference distributions, • experimental measurements, • data libraries from reference distribution sources, • functions deriving from theoretical calculations, • functions deriving from fits, …

Goodness-of-Fit tests The goodness-of-fit tests are introduced with the aim of verifying the hypothesis that experimental data come from a random variable whose distribution is well known. This problem is very important both in theoretical and experimental analysis. The researcher must decide if theoretical and experimental distribution follow the same functional law. In other words, the problem is concerned with the choice of one of these two alternative hypothesis: H0:F0(x) = FT(x) H1:F0(x)  FT(x), F0(x) < FT(x), F0(x) > FT(x) Of course, in this kind of tests the acceptance of the null hypothesis H0 means that the researcher will be able to specify the distribution analyzed.

GOF tests inserted in the statistical package Pearson’s c2 test Kolmogorov test Kolmogorov – Smirnov test Anderson-Darling test (for both continuous and discrete distributions)

Description of tests Pearson’s Chi-squared test was introduced to study discrete (both quantitative and qualitative) distribution’s adaptation. Kolmogorov-Smirnov test is very useful to verify the adaption of a sample coming from a random continuous variable. Anderson-Darling test is performed to be suitable for any data-set (Aksenov and Savageau-2002) with any skewness (simmetric distribuitions, left or right skewned). Moreover it seems to be sensible to fat tail of distributions.

Other tests projected in GOF Of course, the statistical package could be extended with other goodness-of-fit tests, as for instance: Lilliefors test, Cramer-von Mises test, Kuiper test, Bayesian methods…

Other methods Kolmogorov-Smirnov test can be applied only to continuous distributions. Physical distributions are not continuous. Following Dagum, these binned distributions could be fitted (also a mixture of more than one fit could be possible). In this way, Kolmogorov-Smirnov test statistics could be computed between the fitted function and the theoretical distribution, simply changing the number of degrees of freedom of the test.

User requirements Comparing distributions Converting distributions Confidence levels Handling distributions Treatment of errors Plotting

Software Design User layer <=>Developer layer Based on AIDA interfaces It is ageneral tool with an object oriented approach

The code Chi Squared test => OK Anderson-Darling test (discrete distributions) Kolmogorov-Smirnov test => OK Anderson-Darling test (continuous distributions)

Problems with the existing code Inside the Chi Squared Quality Checher it is needed a Gamma Function. It was found inside the GNU Scientific Library, but this one has the problem that does not work with N >171. This could be a problem!

Unit tests Unit tests are to be performed on the statistical package. We should need some suggestions on reference distribution to test the code (test cases). Acceptance test Integration test System test Unit test Any suggestion? Any suggestion?

The Statistical Testing Project