190 likes | 220 Views
M obile C omputing G roup. A quick-and-dirty tutorial on the chi2 test for goodness-of-fit testing. Outline. The presentation follows the pyramid schema. Chi2 tests for GoF. Goodness-of-fit (GoF). Background -concepts. Background. Descriptive vs. inferential statistics
E N D
Mobile Computing Group A quick-and-dirty tutorial on the chi2 test for goodness-of-fit testing
Outline The presentation follows the pyramid schema Chi2 tests for GoF Goodness-of-fit (GoF) Background -concepts
Background • Descriptive vs. inferential statistics • Descriptive : data used only for descriptive purposes (use tables, graphs, measures of variability etc.) • Inferential : data used for drawing inferences, make predictions etc. • Sample vs. population • A sample is drawn from a population, assumed to have some characteristics. • The sample is often used to make inferences about the population (inferential statistics) : • Hypothesis testing • Estimation of population parameters
Background • Statistic vs. parameter • A statistic is related (estimated from) a sample. It can be used for both descriptive and inferential purposes • A parameter refers to the whole population. A sample statistic is often used to infer a population parameter • Example : the sample mean may be used to infer the population mean (expected value) • Hypothesis testing • A procedure where sample data are used to evaluate a hypothesis regarding the population • A hypothesis may refer to several things : properties of a single population, relation between two populations etc. • Two statistical hypotheses are defined: a null H0 and an alternative H1 • H0 is the often a statement of no effect or no difference. It is the hypothesis the researcher seeks to reject
Background • Inferential statistical test • Hypothesis testing is carried out via an inferential statistic test : • Sample data are manipulated to yield a test statistic • The obtained value of the test statistic is evaluated with respect to a sampling distribution, i.e.,a theoretical probability distribution for the possible values of the test statistic • The theoretical values of the statistic are usually tabulated and let someone assess the statistical significance of the result of his statistical test • The goodness-of-fit is a type of hypothesis testing • devise inferential statistical tests, apply them to the sample, infer the matching of a theoretical distribution to the population distribution
GoF as hypothesis testing • Hypothesis H0: • The sample is derived from a theoretical distribution F() • The sample data are manipulated to derive a test statistic • In the case of the chi2 statistic this includes aggregation of data into bins and some computations • The statistic, as computed from data, is checked against the sampling distribution • For the chi2 test, the sampling distribution is the chi2 distribution, hence the name
Goodness-of-fit • Statistical tests and statistics : the big picture EDF-based tests Chi2 type tests Specialized tests e.g., KS test, Anderson-Darling test e.g., Shapiro-Wilk test for normality Generalized chi2 statistics Classical chi2 statistics Log-likelihood ratio statistic Modified chi2 statistic Pearson chi2 statistic
Pearson chi2 statistic • M : number of bins • Oi (Ni):observed frequency in bin i • n : sample size • Ei (npi) : expected frequency in bin i according to the theoretical distribution F() If X1, X2, X3…Xn , the random sample and F() the theoretical distribution under test, the Pearson chi2 statistic is computed as:
Interpretation of chi2 statistic • Theory says that the Pearson chi2 statistic follows a chi2 distribution, whose df are • M-1, when the parameters of the fitted distribution are given a priori (case 0 test) • Somewhere between M-1 and M-1-q, when the q parameters of the distribution are estimated by the sample data • Usually, the df for this case are taken to be M-1-q • Having estimated the value of the chi2 statistic X2 , I check the chi2 distribution with M-1 (M-1-q) df to find • What is the probability to get a value equal to or greater than the computed value X2, called p-value • If p > a, where a is the significance level of my test, the hypothesis is rejected, otherwise it is retained • Standard values for a are 0.1, 0.05, 0.01 – the higher a is the more conservative I am in rejecting the hypothesis H0
Example • A die is rolled 120 times • 1 comes 20 times, 2 comes 14, 3 comes 18, 4 comes 17, 5 comes 22 and 6 comes 29 times • The question is: “Is the die biased?” –or better: “Do these data suggest that the die is biased?” • Hypothesis H0 : the die is not biased • Therefore, according to the null hypothesis these numbers should be distributed uniformly • F() : the discrete uniform distribution
Example – cont. • Interpretation • The distribution of the test statistic has 5 df • The probability to get a value smaller or equal than 6.7 under a chi2 distribution with 5 df (p-value) is 0.75, which is < 1-a for all a in {0.01..0.1}. • Therefore the hypothesis that the die is not biased cannot be rejected • Computations:
Interpretation of Pearson chi2 • Graphical illustration • At 10% significance level, I would reject the hypothesis if the computed X2>9.24) 10% of the area under the curve 6.7 9.24 11.07 15.09 z P-value : 0.25 0.1 0.05 0.01
Properties of Pearson chi2 statistic • It can be estimated for both discrete and continuous variables • Holds for all chi2 statistics. Max flexibility but fails to make use of all available information for continuous variables • It is maybe the simplest one from computational point of view • As with all chi2 statistics, one needs to define number and borders of bins • These are generally a function of sample size and the theoretical distribution under test
Bin selection • How many and which? • Different opinions in literature, no rigid proof of optimality • There seems to be convergence on the following aspects • Probability of bins • The bins should be chosen equiprobable with respect to the theoretical distribution under test • Minimum expected frequencies npi : • (Cramer, 46) : npi > 10, for all bins • (Cochran, 54) : npi > 1 for all bins, npi >= 5 for 80% of bins • (Roscoe and Byars,71)
Bin selection • Relevance of bins M to sample size N • (Mann and Wald, 42), (Schorr, 74) : for large sample sizes 1.88n2/5 < M < 3.76n2/5 • (Koehler and Larntz,80) : for small sample size M>=3, n>=10 and n2/M>=10 • (Roscoe and Byars, 71) • Equi-probable bins hypothesis : N > M when a = 0.01 and a = 0.05 • Non-equiprobable bins : N>2M (a = 0.05) and N>4M (a=0.01)
Bin selection • Bins vs. sample size according to Mann and Ward
Bin selection : cont. vs. discrete 1.0 0.9 0.8 0.7 0.6 Equi-probable bins easy to select 0.5 0.4 0.3 0.2 0.1 Bin i 1.0 Less straightforward to define equi-probable bins 1 2 3 4 5 6 7
References Textbooks • D.J. Sheskin, Handbook of parametric and nonparametric statistical procedures • Introduction (descriptive vs. inferential statistics, hypothesis testing, concepts and terminology) • Test 8 (chap. 8) – The Chi-Square Goodness-of-Fit Test (high-level description with examples and discussion on several aspects) • R. Agostino, M. Stephens, Goodness-of-fit techniques • Chapter 3 – Tests of Chi-square type • Reviews the theoretical background and looks more generally at chi2 tests, not only the Pearson test.
References Papers • S. Horn, Goodness-of-Fit tests for discrete data: A review and an Application to a Health Impairment scale • Good discussion of the properties and pros/cons of most goodness-of-fit tests for discrete data • accessible, tutorial-like