1 / 32

Week 10 Nov 3-7

Week 10 Nov 3-7. Two Mini-Lectures QMM 510 Fall 2014 . Chapter Contents 15.1 Chi-Square Test for Independence 15.2 Chi-Square Tests for Goodness-of-Fit 15.3 Uniform Goodness-of-Fit Test 15.4 Poisson Goodness-of-Fit Test 15.5 Normal Chi-Square Goodness-of-Fit Test

zonta
Download Presentation

Week 10 Nov 3-7

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Week 10 Nov 3-7 Two Mini-Lectures QMM 510 Fall 2014

  2. Chapter Contents 15.1 Chi-Square Test for Independence 15.2 Chi-Square Tests for Goodness-of-Fit 15.3 Uniform Goodness-of-Fit Test 15.4 Poisson Goodness-of-Fit Test 15.5 Normal Chi-Square Goodness-of-Fit Test 15.6 ECDF Tests (Optional) Chi-Square Tests ML 10.1 Chapter 15 So many topics, so little time …

  3. Chi-Square Test for Independence Chapter 15 Contingency Tables • A contingency table is a cross-tabulation of n paired observations into categories. • Each cell shows the count of observations that fall into the category defined by its row (r) and column (c) heading.

  4. Chi-Square Test for Independence Chapter 15 Contingency Tables • For example:

  5. Chi-Square Test for Independence Chapter 15 Chi-Square Test • In a test of independence for an r x c contingency table, the hypotheses areH0: Variable A is independent of variable BH1: Variable A is not independent of variable B • Use the chi-square test for independence to test these hypotheses. • This nonparametric test is based on frequencies. • The n data pairs are classified into c columns and r rows and then the observed frequencyfjk is compared with the expected frequencyejk.

  6. Chi-Square Test for Independence Chapter 15 Chi-Square Distribution • The critical value comes from the chi-square probability distribution with d.f. degrees of freedom. d.f.= degrees of freedom = (r – 1)(c – 1)where r = number of rows in the tablec = number of columns in the table • Appendix E contains critical values for right-tail areas of the chi-square distribution, or use Excel’s =CHISQ.DIST.RT(α,d.f.) • The mean of a chi-square distribution isd.f.with variance 2d.f.

  7. Chi-Square Test for Independence Chapter 15 Chi-Square Distribution Consider the shape of the chi-square distribution:

  8. Chi-Square Test for Independence Chapter 15 Expected Frequencies • Assuming that H0 is true, the expected frequency of row j and column k is: ejk = RjCk/n where Rj = total for row j (j = 1, 2, …, r)Ck = total for column k (k = 1, 2, …, c)n = sample size

  9. Chi-Square Test for Independence Chapter 15 Steps in Testing the Hypotheses • Step 1: State the Hypotheses • H0: Variable A is independent of variable B • H1: Variable A is not independent of variable B • Step 2: Specify the Decision Rule • Calculate d.f. = (r – 1)(c – 1) • For a given α, look up the right-tail critical value (2R) from Appendix E or by using Excel =CHISQ.DIST.RT(α,d.f.). • Reject H0 if 2R > test statistic.

  10. Chi-Square Test for Independence Chapter 15 Steps in Testing the Hypotheses • For example, for d.f. = 6 and α = .05, 2.05 = 12.59.

  11. Chi-Square Test for Independence Chapter 15 Steps in Testing the Hypotheses • Here is the rejection region.

  12. Chi-Square Test for Independence Chapter 15 Steps in Testing the Hypotheses • Step 3: Calculate the Expected Frequencies ejk = RjCk/n • For example,

  13. Chi-Square Test for Independence Chapter 15 Steps in Testing the Hypotheses • Step 4: Calculate the Test Statistic • The chi-square test statistic is • Step 5: Make the Decision • Reject H0 if test statistic 2calc > 2R or if the p-value  α.

  14. Chi-Square Test for Independence Chapter 15 Example: MegaStat all cells have ejk 5 so Cochran’s Rule is met Caution: Don’t highlight row or column totals p-value = 0.2154 is not small enough to reject the hypothesis of independence at α = .05

  15. Chi-Square Test for Independence Chapter 15 Test of Two Proportions • For a 2 × 2 contingency table, the chi-square test is equivalent to a two-tailed z test for two proportions. • The hypotheses are: Figure 14.6

  16. Chi-Square Test for Independence Chapter 15 Small Expected Frequencies • The chi-square test is unreliable if the expected frequencies are too small. • Rules of thumb: • Cochran’s Rule requires that ejk > 5 for all cells. • Up to 20% of the cells may have ejk < 5 • Most agree that a chi-square test is infeasible if ejk < 1 in any cell. • If this happens, try combining adjacent rows or columns to enlarge the expected frequencies.

  17. Chi-Square Test for Independence Chapter 15 Cross-Tabulating Raw Data • Chi-square tests for independence can also be used to analyze quantitative variables by coding them into categories. • For example, the variables Infant Deaths per 1,000 and Doctors • per 100,000 can each be coded into various categories:

  18. Chi-Square Test for Independence Chapter 15 Why Do a Chi-Square Test on Numerical Data? • The researcher may believe there’s a relationship between X and Y, but doesn’t want to use regression. • There are outliers or anomalies that prevent us from assuming that the data came from a normal population. • The researcher has numerical data for one variable but not the other.

  19. Chi-Square Test for Independence Chapter 15 3-Way Tables and Higher • More than two variables can be compared using contingency tables. • However, it is difficult to visualize a higher-order table. • For example, you could visualize a cube as a stack of tiled 2-way contingency tables. • Major computer packages permit three-way tables.

  20. Chi-Square Tests for Goodness-of-Fit ML 10.2 Chapter 15 Purpose of the Test • The goodness-of-fit (GOF) test helps you decide whether your sample resembles a particular kind of population. • The chi-square test is versatile and easy to understand. Hypotheses for GOF tests: • The hypotheses are: H0: The population follows a _____ distributionH1: The population does not follow a ______ distribution • The blank may contain the name of any theoretical distribution (e.g., uniform, Poisson, normal).

  21. Chi-Square Tests for Goodness-of-Fit Chapter 15 Test Statistic and Degrees of Freedom for GOF • Assuming n observations, the observations are grouped into c classes and then the chi-square test statistic is found using: where fj = the observed frequency of observations in class j ej = the expected frequency in class j if the samplecame from the hypothesized population

  22. Chi-Square Tests for Goodness-of-Fit Chapter 15 Test Statistic and Degrees of Freedom for GOF tests • If the proposed distribution gives a good fit to the sample, the test statistic will be near zero. • The test statistic follows the chi-square distribution with degrees of freedomd.f. = c – m – 1. • where c is the number of classes used in the test and m is the number of parameters estimated.

  23. Normal Chi-Square GOF Test Chapter 15 • Many statistical tests assume a normal population, so this the most common GOF test. • Two parameters, the mean μ and the standard deviationσ, fully describe a normal distribution. • Unless μand σ are known apriori, they must be estimated from a sample in order to perform a GOF test for normality. Is the Sample from a Normal Population?

  24. Normal Chi-Square GOF Test Chapter 15 Method 1: Standardize the Data • Transform sample observations x1, x2, …, xninto standardized z-values. • Count the sample observations within each interval on the z-scale and compare them with expected normal frequencies ej. Problem: Frequencies will be small in the end bins yet large in the middle bins (this may violate Cochran’s Rule and seems inefficient).

  25. Normal Chi-Square GOF Test Chapter 15 Method 2: Equal Bin Widths • Step 1: Divide the exact data range into c groups of equal width, and count the sample observations in each bin to get observed bin frequencies fj. • Step 2: Convert the bin limits into standardized z-values: • Step 3: Find the normal area within each bin assuming a normal distribution. • Step 4: Find expected frequencies ej by multiplying each normal area by the sample size n. Problem: Frequencies will be small in the end bins yet large in the middle bins (this may violate Cochran’s Rule and seems inefficient).

  26. Normal Chi-Square GOF Test Chapter 15 Method 3: Equal Expected Frequencies • Define histogram bins in such a way that an equal number of observations would be expected under the hypothesis of a normal population, i.e., so that ej= n/c. • A normal area of 1/c is expected in each bin. • The first and last classes must be open-ended, so to define c bins we need c-1 cut points. • Count the observations fj within each bin. • Compare the fjwith the expected frequencies ej = n/c. Advantage:Makes efficient use of the sample. Disadvantage: Cut points on the z-scale points may seem strange.

  27. Normal Chi-Square GOF Test Chapter 15 Method 3: Equal Expected Frequencies • Standard normal cut points for equal area bins. Table 15.16

  28. Normal Chi-Square GOF Test Chapter 15 Critical Values for Normal GOF Test • Two parameters, m and s, are estimated from the sample, so the degrees of freedom are d.f. = c – m – 1. • We need at least four bins to ensure at least one degree of freedom. Small Expected Frequencies • Cochran’s Rule suggests at least ej 5 in each bin (e.g., with 4 bins we would want n 20, and so on).

  29. Normal Chi-Square GOF Test Chapter 15 Visual Tests • The fitted normal superimposed on a histogram gives visual clues as to the likely outcome of the GOF test. • A simple “eyeball” inspection of the histogram may suffice to rule out a normal population by revealing outliers or other non-normality issues.

  30. ECDF Tests ML 10.3 Chapter 15 ECDF Tests for Normality • There are alternatives to the chi-square test for normality based on the empirical cumulative distribution function (ECDF). • ECDF tests are done by computer. Details are omitted here. • A small p-value casts doubt on normality of the population. • The Kolmogorov-Smirnov (K-S)test uses the largest absolute difference between the actual and expected cumulative relative frequency of the n data values. • The Anderson-Darling (A-D) test is based on a probability plot. When the data fit the hypothesized distribution closely, the probability plot will be close to a straight line. The A-D test is widely used because of its power and attractive visual.

  31. ECDF Tests Chapter 15 Example: Minitab’s Anderson-Darling Test for Normality Near-linear probability plot suggests good fit to normal distribution p-value = 0.122 is not small enough to reject normal population at α = .05 Data: weights of 80 babies (in ounces)

  32. ECDF Tests Chapter 15 Example: MegaStat’s Normality Tests p-value = 0.2487 is not small enough to reject normal population at α = .05 in this chi-square test Near-linear probability plot suggests good fit to normal distribution Data: weights of 80 babies (in ounces) Note:MegaStat’s chi-square test is not as powerful as the A-D test, so we would prefer the A-D test if software is available. The MegaStat probability plot is good, but shows no p-value.

More Related