320 likes | 541 Views
Week 10 Nov 3-7. Two Mini-Lectures QMM 510 Fall 2014 . Chapter Contents 15.1 Chi-Square Test for Independence 15.2 Chi-Square Tests for Goodness-of-Fit 15.3 Uniform Goodness-of-Fit Test 15.4 Poisson Goodness-of-Fit Test 15.5 Normal Chi-Square Goodness-of-Fit Test
E N D
Week 10 Nov 3-7 Two Mini-Lectures QMM 510 Fall 2014
Chapter Contents 15.1 Chi-Square Test for Independence 15.2 Chi-Square Tests for Goodness-of-Fit 15.3 Uniform Goodness-of-Fit Test 15.4 Poisson Goodness-of-Fit Test 15.5 Normal Chi-Square Goodness-of-Fit Test 15.6 ECDF Tests (Optional) Chi-Square Tests ML 10.1 Chapter 15 So many topics, so little time …
Chi-Square Test for Independence Chapter 15 Contingency Tables • A contingency table is a cross-tabulation of n paired observations into categories. • Each cell shows the count of observations that fall into the category defined by its row (r) and column (c) heading.
Chi-Square Test for Independence Chapter 15 Contingency Tables • For example:
Chi-Square Test for Independence Chapter 15 Chi-Square Test • In a test of independence for an r x c contingency table, the hypotheses areH0: Variable A is independent of variable BH1: Variable A is not independent of variable B • Use the chi-square test for independence to test these hypotheses. • This nonparametric test is based on frequencies. • The n data pairs are classified into c columns and r rows and then the observed frequencyfjk is compared with the expected frequencyejk.
Chi-Square Test for Independence Chapter 15 Chi-Square Distribution • The critical value comes from the chi-square probability distribution with d.f. degrees of freedom. d.f.= degrees of freedom = (r – 1)(c – 1)where r = number of rows in the tablec = number of columns in the table • Appendix E contains critical values for right-tail areas of the chi-square distribution, or use Excel’s =CHISQ.DIST.RT(α,d.f.) • The mean of a chi-square distribution isd.f.with variance 2d.f.
Chi-Square Test for Independence Chapter 15 Chi-Square Distribution Consider the shape of the chi-square distribution:
Chi-Square Test for Independence Chapter 15 Expected Frequencies • Assuming that H0 is true, the expected frequency of row j and column k is: ejk = RjCk/n where Rj = total for row j (j = 1, 2, …, r)Ck = total for column k (k = 1, 2, …, c)n = sample size
Chi-Square Test for Independence Chapter 15 Steps in Testing the Hypotheses • Step 1: State the Hypotheses • H0: Variable A is independent of variable B • H1: Variable A is not independent of variable B • Step 2: Specify the Decision Rule • Calculate d.f. = (r – 1)(c – 1) • For a given α, look up the right-tail critical value (2R) from Appendix E or by using Excel =CHISQ.DIST.RT(α,d.f.). • Reject H0 if 2R > test statistic.
Chi-Square Test for Independence Chapter 15 Steps in Testing the Hypotheses • For example, for d.f. = 6 and α = .05, 2.05 = 12.59.
Chi-Square Test for Independence Chapter 15 Steps in Testing the Hypotheses • Here is the rejection region.
Chi-Square Test for Independence Chapter 15 Steps in Testing the Hypotheses • Step 3: Calculate the Expected Frequencies ejk = RjCk/n • For example,
Chi-Square Test for Independence Chapter 15 Steps in Testing the Hypotheses • Step 4: Calculate the Test Statistic • The chi-square test statistic is • Step 5: Make the Decision • Reject H0 if test statistic 2calc > 2R or if the p-value α.
Chi-Square Test for Independence Chapter 15 Example: MegaStat all cells have ejk 5 so Cochran’s Rule is met Caution: Don’t highlight row or column totals p-value = 0.2154 is not small enough to reject the hypothesis of independence at α = .05
Chi-Square Test for Independence Chapter 15 Test of Two Proportions • For a 2 × 2 contingency table, the chi-square test is equivalent to a two-tailed z test for two proportions. • The hypotheses are: Figure 14.6
Chi-Square Test for Independence Chapter 15 Small Expected Frequencies • The chi-square test is unreliable if the expected frequencies are too small. • Rules of thumb: • Cochran’s Rule requires that ejk > 5 for all cells. • Up to 20% of the cells may have ejk < 5 • Most agree that a chi-square test is infeasible if ejk < 1 in any cell. • If this happens, try combining adjacent rows or columns to enlarge the expected frequencies.
Chi-Square Test for Independence Chapter 15 Cross-Tabulating Raw Data • Chi-square tests for independence can also be used to analyze quantitative variables by coding them into categories. • For example, the variables Infant Deaths per 1,000 and Doctors • per 100,000 can each be coded into various categories:
Chi-Square Test for Independence Chapter 15 Why Do a Chi-Square Test on Numerical Data? • The researcher may believe there’s a relationship between X and Y, but doesn’t want to use regression. • There are outliers or anomalies that prevent us from assuming that the data came from a normal population. • The researcher has numerical data for one variable but not the other.
Chi-Square Test for Independence Chapter 15 3-Way Tables and Higher • More than two variables can be compared using contingency tables. • However, it is difficult to visualize a higher-order table. • For example, you could visualize a cube as a stack of tiled 2-way contingency tables. • Major computer packages permit three-way tables.
Chi-Square Tests for Goodness-of-Fit ML 10.2 Chapter 15 Purpose of the Test • The goodness-of-fit (GOF) test helps you decide whether your sample resembles a particular kind of population. • The chi-square test is versatile and easy to understand. Hypotheses for GOF tests: • The hypotheses are: H0: The population follows a _____ distributionH1: The population does not follow a ______ distribution • The blank may contain the name of any theoretical distribution (e.g., uniform, Poisson, normal).
Chi-Square Tests for Goodness-of-Fit Chapter 15 Test Statistic and Degrees of Freedom for GOF • Assuming n observations, the observations are grouped into c classes and then the chi-square test statistic is found using: where fj = the observed frequency of observations in class j ej = the expected frequency in class j if the samplecame from the hypothesized population
Chi-Square Tests for Goodness-of-Fit Chapter 15 Test Statistic and Degrees of Freedom for GOF tests • If the proposed distribution gives a good fit to the sample, the test statistic will be near zero. • The test statistic follows the chi-square distribution with degrees of freedomd.f. = c – m – 1. • where c is the number of classes used in the test and m is the number of parameters estimated.
Normal Chi-Square GOF Test Chapter 15 • Many statistical tests assume a normal population, so this the most common GOF test. • Two parameters, the mean μ and the standard deviationσ, fully describe a normal distribution. • Unless μand σ are known apriori, they must be estimated from a sample in order to perform a GOF test for normality. Is the Sample from a Normal Population?
Normal Chi-Square GOF Test Chapter 15 Method 1: Standardize the Data • Transform sample observations x1, x2, …, xninto standardized z-values. • Count the sample observations within each interval on the z-scale and compare them with expected normal frequencies ej. Problem: Frequencies will be small in the end bins yet large in the middle bins (this may violate Cochran’s Rule and seems inefficient).
Normal Chi-Square GOF Test Chapter 15 Method 2: Equal Bin Widths • Step 1: Divide the exact data range into c groups of equal width, and count the sample observations in each bin to get observed bin frequencies fj. • Step 2: Convert the bin limits into standardized z-values: • Step 3: Find the normal area within each bin assuming a normal distribution. • Step 4: Find expected frequencies ej by multiplying each normal area by the sample size n. Problem: Frequencies will be small in the end bins yet large in the middle bins (this may violate Cochran’s Rule and seems inefficient).
Normal Chi-Square GOF Test Chapter 15 Method 3: Equal Expected Frequencies • Define histogram bins in such a way that an equal number of observations would be expected under the hypothesis of a normal population, i.e., so that ej= n/c. • A normal area of 1/c is expected in each bin. • The first and last classes must be open-ended, so to define c bins we need c-1 cut points. • Count the observations fj within each bin. • Compare the fjwith the expected frequencies ej = n/c. Advantage:Makes efficient use of the sample. Disadvantage: Cut points on the z-scale points may seem strange.
Normal Chi-Square GOF Test Chapter 15 Method 3: Equal Expected Frequencies • Standard normal cut points for equal area bins. Table 15.16
Normal Chi-Square GOF Test Chapter 15 Critical Values for Normal GOF Test • Two parameters, m and s, are estimated from the sample, so the degrees of freedom are d.f. = c – m – 1. • We need at least four bins to ensure at least one degree of freedom. Small Expected Frequencies • Cochran’s Rule suggests at least ej 5 in each bin (e.g., with 4 bins we would want n 20, and so on).
Normal Chi-Square GOF Test Chapter 15 Visual Tests • The fitted normal superimposed on a histogram gives visual clues as to the likely outcome of the GOF test. • A simple “eyeball” inspection of the histogram may suffice to rule out a normal population by revealing outliers or other non-normality issues.
ECDF Tests ML 10.3 Chapter 15 ECDF Tests for Normality • There are alternatives to the chi-square test for normality based on the empirical cumulative distribution function (ECDF). • ECDF tests are done by computer. Details are omitted here. • A small p-value casts doubt on normality of the population. • The Kolmogorov-Smirnov (K-S)test uses the largest absolute difference between the actual and expected cumulative relative frequency of the n data values. • The Anderson-Darling (A-D) test is based on a probability plot. When the data fit the hypothesized distribution closely, the probability plot will be close to a straight line. The A-D test is widely used because of its power and attractive visual.
ECDF Tests Chapter 15 Example: Minitab’s Anderson-Darling Test for Normality Near-linear probability plot suggests good fit to normal distribution p-value = 0.122 is not small enough to reject normal population at α = .05 Data: weights of 80 babies (in ounces)
ECDF Tests Chapter 15 Example: MegaStat’s Normality Tests p-value = 0.2487 is not small enough to reject normal population at α = .05 in this chi-square test Near-linear probability plot suggests good fit to normal distribution Data: weights of 80 babies (in ounces) Note:MegaStat’s chi-square test is not as powerful as the A-D test, so we would prefer the A-D test if software is available. The MegaStat probability plot is good, but shows no p-value.