1 / 38

Chapter 11: Applications of Chi-Square

Chapter 11: Applications of Chi-Square. Chapter Goals. Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results with expected results to determine (1) Preferences (2) Independence (3) Homogeneity

branhamk
Download Presentation

Chapter 11: Applications of Chi-Square

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 11: Applications of Chi-Square

  2. Chapter Goals • Investigate two tests: multinomial experiment, and the contingency table. • Compare experimental results with expected results to determine (1) Preferences (2) Independence (3) Homogeneity • Enumerative data: data that is placed in categories and counted.

  3. 11.1: Chi-Square Statistic • Many problems for which the data is categorized and the results shown by way of counts. • Results are often displayed on a chart showing the number of observations for each possible category.

  4. Background: 1. Suppose there are n observations. 2. Each observation falls into a cell (or class). 3. Observed frequencies in each cell: O1, O2, O3, … , Ok. Sum of the observed frequencies is n. 4. Expected, or theoretical, frequencies: E1, E2, E3, . . . , Ek. Summary of notation:

  5. Goal: 1. Compare the observed frequencies with the expected frequencies. 2. Decide whether the observed frequencies seem to agree or seem to disagree with the expected frequencies. Methodology: Use a chi-square statistic: Small values of c2: Observed frequencies close to expected frequencies. Large values of c2: Observed frequencies do not agree with expected frequencies.

  6. Sampling Distribution of c2*: When n is large and all expected frequencies are greater than or equal to 5, then c2* has approximately a chi-square distribution. Recall: Properties of the Chi-Square Distribution: 1. c2 is nonnegative in value; it is zero or positively valued. 2. c2 is not symmetrical; it is skewed to the right. 3. c2 is distributed so as to form a family of distributions, a separate distribution for each different number of degrees of freedom.

  7. Various Chi-Square Distributions:

  8. Critical values for chi-square: 1. Table 8, Appendix B. 2. Identified by degrees of freedom (df) and the area under the curve to the right of the critical value. 3. c2(df, a): critical value of a chi-square distribution with df degrees of freedom and a area to the right. 4. Chi-square distribution is not symmetrical: critical values associated with right and left tails are given separately.

  9. Example: Find c2(16, 0.05). Portion of Table 8 c2(16, 0.05) = 26.3

  10. Example: Find c2(10, 0.99). Portion of Table 8 c2(10, 0.99) = 2.56

  11. Note: 1. When df > 2, the mean value of the chi-square distribution is df. 2. The mean is located to the right of the mode (the value where the curve reaches its high point) and just to the right of the median (the value that splits the distribution, 50% on either side). mode median

  12. Note: 1. There is a separate chi-square distribution for each degree of freedom, df. 2. Assumptions for this chi-square test: a. Information is obtained from a random sample. b. Each observation is classified according to the categorical variable(s) involved in the test. 3. Categorical Variable: a variable that classifies or categorizes each individual into exactly one of several cells or classes; these cells or classes are all inclusive and mutually exclusive. 4. Liberal statements of null and alternative hypotheses. Not simply statements about population parameters.

  13. 11.2: Inferences Concerning Multinomial Experiments • Examine the testing procedure for multinomial experiments. • Do the observed frequencies match the expected frequencies? • Hypothesis test is based on the c2* statistic.

  14. Multinomial Experiment: An experiment with the following characteristics: 1. It consists of n identical independent trials. 2. The outcome of each trial fits into exactly one of k possible cells. 3. There is a probability associated with each particular cell, and these individual probabilities remain constant during the experiment. 4. The experiment will result in a set of observed frequencies, O1, O2, . . . , Ok, where each Oi is the number of times a trial outcome falls into that particular cell. (It must be the case that O1 + O2 + + Ok = n.)

  15. Testing Procedure: 1. H0: The probabilities p1, p2, . . . , pk are correct. Ha: At least two probabilities are incorrect. Allow for liberal interpretation of H0 and Ha. 2. Test statistic: 3. Use a one-tailed critical region; the right-hand tail. 4. Degrees of freedom: df = k- 1. 5. Expected frequencies: 6. To ensure a good approximation to the chi-square distribution: Each expected frequency should be at least 5

  16. Example: A market research firm conducted a consumer-preference experiment to determine which of 5 new breakfast cereals was the most appealing to adults. A sample of 100 consumers tried each cereal and indicated the cereal he or she preferred. The results are given in the following table: Is there any evidence to suggest the consumers had a preference for one cereal, or did they indicate each cereal was equally likely to be selected? Use a = 0.05.

  17. Solution: If no preference was shown, we expect the 100 consumers to be equally distributed among the 5 cereals. Thus, if no preference is given, we expect (100)(0.2) = 20 consumers in each class. 1. The Set-up: a. Population parameter of concern: Preference for each cereal, the probability that a particular cereal is selected. b. The null and alternative hypotheses: H0: There was no preference shown (equally distributed). Ha: There was a preference shown (not equally distributed). 2. The Hypothesis Test Criteria: a. Assumptions: The 100 consumers represent a random sample. b. Test statistic: c2* with df = k- 1 = 5 - 1 = 4 c. Level of significance: a = 0.05.

  18. 3. The Sample Evidence: a. Sample information: Table given in the statement of the problem. b. Calculate the value of the test statistic: c2* = 3.2

  19. 4. The Probability Distribution (Classical Approach): a. Critical value: c2(k- 1, 0.05) = c2(4, 0.05) = 9.49 b. c2* is not in the critical region. 4. The Probability Distribution (p-Value Approach): a. The p-value: Using computer: P = 0.5429. Using Table 8: P > 0.5 b. The p-value is larger than the level of significance, a. 5. The Results: a. Decision: Fail to reject H0. b. Conclusion: At the 0.05 level of significance, there is no evidence to suggest the consumers showed a preference for any one cereal.

  20. Example: A sample of 200 individuals were tested for their blood type, and the results are used to test the hypothesized distribution of blood types: At the 0.05 level of significance, is there any evidence to suggest the stated distribution is incorrect?

  21. Solution: 1. The Set-up: a. Population parameters of concern: The proportions: P(A), P(B), P(O), P(AB). b. The null and alternative hypotheses: H0: Blood type proportions are 0.41, 0.09, 0.46, 0.04 Ha: Blood type proportions are not 0.41, 0.09, 0.46, 0.04 2. The Hypothesis Test Criteria: a. Assumptions: The 200 individuals tested form a random sample. b. Test statistic: c2*, df = 4 - 1 = 3 c. Significance level: a = 0.05

  22. 3. The Sample Evidence: a. Sample information: Table given in the statement of the problem. b. Calculate the value of the test statistic: c2* = 10.02

  23. 4. The Probability Distribution (Classical Approach): a. Critical value: c2(3, 0.05) = 7.82 b. c2* is in the critical region. 4. The Probability Distribution (p-Value Approach): a. The p-value: By computer: P = 0.0184. Table 8: 0.01 < P < 0.025 b. The p-value is smaller than the level of significance, a. 5. The Results: a. Decision: Reject H0. b. Conclusion: There is evidence to suggest the hypothesized proportions for blood types are incorrect.

  24. 11.3: Inference Concerning Contingency Tables • Contingency table: an arrangement of data into a two-way classification. • Data is sorted into cells, and the observed frequency in each cell is reported. • Contingency table involves two factors, or variables • Usual question: are the two variables independent or dependent?

  25. r´cContingency Table: 1. r: number of rows; c: number of columns. 2. Used to test the independence of the row factor and the column factor. 3. Degrees of freedom: 4. n = grand total. 5. Expected frequency in the ith row and the jth column: Each Ei,j should be at least 5. 6. R1, R2, . . . , Rr and C1, C2, . . . Cc: marginal totals.

  26. Expected Frequencies for an r´ c Contingency Table:

  27. Example: A random sample of registered voters was selected and each was asked his or her opinion on Proposal 129, a property tax reform bill. The distribution of responses is given in the table below. Test the hypothesis “political party is independent of opinion on Proposal 129.” Use a = 0.01.

  28. Solution: 1. The Set-up: a. Population parameters of concern: The independence of variables “political party” and “opinion on tax reform.” b. The null and the alternative hypotheses: H0: Opinion on property tax reform is independent of political party. Ha: Opinion on property tax reform is not independent of political party. 2. The Hypothesis Test Criteria: a. Assumptions: The information was obtained from a random sample in which each individual was classified according to political party and tax reform preference.

  29. b. Test statistic: c2* with df = (r- 1) (c- 1) = (3 - 1) (3 - 1) = 4 c. Level of significance: a = 0.01. 3. The Sample Evidence: a. Sample information: Table given in the statement of the problem. b. Calculate the value of the test statistic: Table with observed frequencies, expected frequencies, and the test statistic given on the next slide.

  30. Contingency table showing sample results and expected values:

  31. 4. The Probability Distribution (Classical Approach): a. Critical value: c2(4, 0.01) = 13.3 b. c2* is in the critical region. 4. The Probability Distribution (p-Value Approach): a. The p-value: By computer: P = 0.0068. Table 8: 0.005 < P < 0.01 b. The p-value is smaller than the level of significance, a. 5. The Results: a. Decision: Reject H0. b. Conclusion: There is evidence to suggest that opinion on tax reform and political party are not independent.

  32. Note: Minitab output for the previous Example. Chi-Square Test Expected counts are printed below observed counts Dem Rep Ind Total 1 34 11 12 57 23.98 15.33 17.69 2 17 12 18 47 19.77 12.64 14.59 3 10 16 15 41 17.25 11.03 12.72 Total 61 39 45 145 Chi-Sq = 4.188 + 1.224 + 1.830 + 0.389 + 0.033 + 0.799 + 3.046 + 2.242 + 0.407 = 14.156 DF = 4, P-Value = 0.007

  33. Test for Homogeneity: 1. Another type of contingency table problem. 2. Used when one of the two variables is controlled by the experimenter so that the row (or column) totals are predetermined. 3. Hypothesis test: the distribution of proportions within rows (or columns) is the same for all rows (or columns). 4. May be thought of as a comparison of several multinomial experiments. 5. Test procedure for independence and homogeneity with contingency tables is the same.

  34. Example: A pharmaceutical company conducted an experiment to determine the effectiveness of three new cough suppressants. Each cough syrup was given to 100 random subjects. Is there any evidence to suggest the syrups act differently to suppress coughs? Use a = 0.05.

  35. Solution: 1. The Set-up: a. Population parameters of concern: The proportion of individuals who receive no relief, some relief, or total relief for each of the three cough syrups. b. The null and alternative hypotheses: H0: The proportion of individuals who receive various forms of relief is the same for all three cough syrups. Ha: The proportion of individuals who receive various forms of relief is not the same for all three cough syrups. (In at least one group the proportions are different from the others.)

  36. 2. The Hypothesis Test Criteria: a. Assumptions: The sample information was obtained using three random samples drawn from three separate populations in which each individual was classified according to cough suppressant and relief. b. Test statistic: c2* with df = (r- 1) (c- 1) = (3 - 1) (3 - 1) = 4 c. Level of significance: a = 0.05. 3. The Sample Evidence: a. Sample information: Table given in the statement of the problem. b. Calculate the value of the test statistic:

  37. A portion of the Minitab output: A B C Total 1 23 29 20 72 24.00 24.00 24.00 2 60 56 50 166 55.33 55.33 55.33 3 17 15 30 62 20.67 20.67 20.67 Total 100 100 100 300 Chi-Sq = 0.042 + 1.042 + 0.667 + 0.394 + 0.008 + 0.514 + 0.651 + 1.554 + 4.215 = 9.085 DF = 4, P-Value = 0.059

  38. 4. The Probability Distribution (Classical Approach): a. Critical value: c2(4, 0.05) = 9.49 b. c2* does not lie in the critical region. 4. The Probability Distribution (p-Value Approach): a. The p-value: By computer: P = 0.059. Table 8: 0.05 < P < 0.010 b. The p-value is larger than the level of significance, a. 5. The Results: a. Decision: Fail to reject H0. b. Conclusion: There is no evidence to suggest the three remedies act differently to suppress coughs.

More Related