120 likes | 213 Views
Statistical Toolkit. S.Donadio, B.Mascialino. July 2 nd , 2003. Status of algorithms. Chi2 (binned distributions) Chi2 (curves – sets of points) Kolmogorov-Smirnov-Goodman Kolmogorov-Smirnov Cramer-von Mises (binned) Cramer-von Mises (unbinned) Anderson-Darling (binned)
E N D
Statistical Toolkit S.Donadio, B.Mascialino July 2nd, 2003
Status of algorithms • Chi2 (binned distributions) • Chi2 (curves – sets of points) • Kolmogorov-Smirnov-Goodman • Kolmogorov-Smirnov • Cramer-von Mises (binned) • Cramer-von Mises (unbinned) • Anderson-Darling (binned) • Anderson-Darling (unbinned) • Kuiper
Status of Quality Checkers • Chi2 • Kolmogorov-Smirnov-Goodman • Kolmogorov-Smirnov • Cramer-von Mises • Anderson-Darling • Kuiper
Last algorithm (to be added still) Lilliefors test is similar to Kolmogorov test, but is based on the null hypothesis that the random continuous variable is distributed as a normal N(m,s2), when m and s2 are unknown. In practice, being the parameters unknown, the researcher must estimate them from the sample itself (x1,x2,...,xn) and in this way it becomes possible to study the standardized sample (z1,z2,...,zn). The test is performed comparing the empirical repartition function F of (z1,z2,...,zn) with the one of the standardized normal distribution F(z): D* = sup |FO(z) - F(z)|
Lilliefors needs a theoretical function in input DISTRIBUTION 2 DISTRIBUTION 1 TOOLKIT INPUT: BINNED DISTRIBUTIONS UNBINNED DISTRIBUTIONS THEORETICAL DISTRIBUTIONS Test for Normality, … THEORETICAL FUNCTION
New algorithmCramer von Mises Tiku It approximates Cramer von Mises test statistics with a 2. It uses 2 Quality Checker. Tiku M.L. Chi-squared approximation for the distributions of goodness of fit UN2 and WN2. Biometrika, 52, (1965b), 630.
New AlgorithmKolmogorov-Smirnov (binned) It allows the calculation of Kolmogorov-Smirnov test statistics in case of binned distributions. It uses a different quality checker (see Conover (1971), Gibbons and Chakraborti (1992) ). We must find it!
Uncertainties treatment We must decide how to treat errors inside the statistical toolkit. Distributions are entered as a couple of DataPointSets: Data Weight The handling of Data and Weight in the computation of the test statistics is different in case of distributions, of curves or of sets of points.
An example 2 = {(y1i – y2i)2 / [(1i)2 + (2i)2]} In the case of two distributions 2 is computed using only “Weights”. In the case of two curves or sets of points, the numerator involves “Data”and the denominator uses “Weights”. THIS COULD BE MISLEADING!
Data Weights Errors • So, in order to have a coherent language for all the algorithms, • we should have: • Data • Weights • Errors • Whenever errors are not necessary for the computation of the • test statistics we could fill them as a null vector.
Selecting data 1 Elimination of data points if n30 CRITERION OF 3-SIGMA: If a point is 3-standard deviation away from the mean of data points, there is about a 0.001 probability of obtaining in a single measurement a value that is that far from the mean. We can choose the elimination of this data point.
Selecting data 2 Elimination of data points if n10 CHAUVENET’S CRITERION: There are n sample observation from a gaussian distribution N(0,1), we should expect n’ to be in error by or more, where P(z-/ )=P(/)= 1 – n’/n “If n’=0.5 means that even one observation with this amount of error is unlikely. We can discard a data point if we expect less than half an event to be further from the mean than the suspect data point.”