590 likes | 1.03k Views
14-2. Chapter 14. Chi-Square Procedures. . 14-3. Outline. Do I Need to Read This Chapter? You should read the Chapter if you would like to learn about: 14-1 Properties of the chi-square distribution.14-2 The chi-square test for goodness-of-fit.14-3 The chi-square te
E N D
1. 14-1 SLIDES PREPARED
By
Lloyd R. Jaisingh Ph.D.
Morehead State University
Morehead KY
2. 14-2 Chapter 14
Chi-Square Procedures
3. 14-3 Outline Do I Need to Read This Chapter? You should read the Chapter if you would like to learn about:
14-1 Properties of the chi-square distribution.
14-2 The chi-square test for goodness-of-fit.
14-3 The chi-square test for independence.
14-4 Benford’s Law.
4. 14-4 Objectives To introduce you to the chi-square distribution.
To use the chi-square distribution to perform tests for goodness-of-fit and independence.
5. 14-5 Objectives To introduce you to Benford’s Law.
To introduce technology integration for chi-square tests.
6. 14-6 14-1 The Chi-Square ( ?2 ) Distribution - Properties It is a continuous distribution.
It is not symmetric.
It is skewed to the right.
The distribution depends on the degrees of freedom, df = n – 1, where n is the sample size.
7. 14-7 14-1 The Chi-Square ( ?2 ) Distribution - Properties The value of a ?2 random variable is always nonnegative.
There are infinitely many ?2 distributions, since each is uniquely defined by its degrees of freedom.
8. 14-8 14-1 The Chi-Square ( ?2 ) Distribution - Properties For small sample size, the ?2 distribution is much skewed to the right.
As n increases, the ?2 distribution becomes more and more symmetrical.
9. 14-9 14-1 The Chi-Square ( ?2 ) Distribution - Properties
10. 14-10 14-1 The Chi-Square ( ?2 ) Distribution - Properties Since we will be using the ?2 distribution for the tests in this chapter, we will need to be able to find critical values associated with the distribution.
11. 14-11 Quick Tip Extensive tables of critical values are available for use in solving confidence intervals and hypothesis testing problems that are associated with the ?2 distribution.
12. 14-12 14-1 The Chi-Square ( ?2 ) Distribution - Properties Notation: ?2?, n-1
Explanation of the notation ?2?, n -1: ?2?, n -1 is a ?2 value with n - 1 degrees of freedom such that ? area is to the right of the corresponding ?2 value.
13. 14-13 14-1 The Chi-Square ( ?2 ) Distribution - Properties
14. 14-14 14-1 The Chi-Square ( ?2 ) Distribution - Properties Values for the random variable with the appropriate degrees of freedom can be obtained from the tables in the appendix of the text (Table 4).
Example: What is the value of ?20.05,10?
15. 14-15 14-1 The Chi-Square ( ?2 ) Distribution - Properties Solution: From Table 4 in the appendix, ?20.05,10 = 18.307. (Verify).
Example: What is the value of ?20.95,20?
Solution: From Table 4 in the appendix, ?20.95,20 = 10.851. (Verify).
16. 14-16 14-2 The Chi-Square test for Goodness of Fit Have you ever wondered whether a sample of observed data (frequency distribution or proportions) fits some pattern or distribution?
We should not expect the pattern to exactly fit a given distribution, so we can look for differences and make conclusions as to the goodness-of-fit of the data.
17. 14-17 14-2 The Chi-Square test for Goodness of Fit From the Figure on the next slide, one can clearly see that the pattern of the sample data does not quite follow the distribution of the population.
As a matter of fact, the sample data deviates quite severely from the population distribution.
18. 14-18 14-2 The Chi-Square test for Goodness of Fit
19. 14-19 14-2 The Chi-Square test for Goodness of Fit Hence one may intuitively conclude in this case that the sample data did not come from the population to which it is compared because of the large deviations from the sample distribution to the population distribution.
20. 14-20 14-2 The Chi-Square test for Goodness of Fit From the Figure on the next slide, one can observe that the sample distribution follows quite closely to the population distribution.
In this case, one may intuitively conclude that the sample data did come from the population to which it is compared because of the very small deviation of the sample distribution from the population distribution.
21. 14-21 14-2 The Chi-Square test for Goodness of Fit
22. 14-22 14-2 The Chi-Square test for Goodness of Fit Generally, we can assume that a good fit exists.
That is, we can propose a hypothesis that a specified theoretical distribution is appropriate to model the pattern.
Below is a summary of the tests for goodness-of-fit.
23. 14-23 14-2 The Chi-Square test for Goodness of Fit
24. 14-24 Quick Tip The chi-square goodness of fit test is always a right-tailed test.
25. 14-25 Quick Tip For the chi-square goodness-of-fit test, the expected frequencies should be at least 5.
When the expected frequency of a class or category is less than 5, this class or category can be combined with another class or category so that the expected frequency is at least 5.
26. 14-26 EXAMPLE Example: There are 4 TV sets that are located in the student center of a large university. At a particular time each day, four different soap operas (1, 2, 3, and 4) are viewed on these TV sets. The percentages of the audience captured by these shows during one semester were 25 percent, 30 percent, 25 percent, and 20 percent, respectively. During the first week of the following semester, 300 students are surveyed.
27. 14-27 EXAMPLE (Continued) (a) If the viewing pattern has not changed, what number of students is expected to watch each soap opera?
Solution: Based on the information, the expected values will be: 0.25?300 = 75, 0.30?300 = 90, 0.25?300 = 75, and 0.20?300 = 60.
28. 14-28 EXAMPLE (Continued) (b) Suppose that the actual observed numbers of students viewing the soap operas are given in the following table, test whether these numbers indicate a change at the 1 percent level of significance.
29. 14-29 EXAMPLE (Continued) Solution: Given ? = 0.01, n = 4, df = 4 – 1 = 3, ?20.01, 3= 11.345. The observed and expected frequencies are given below
30. 14-30 EXAMPLE (Continued) Solution (continued): The ?2 test statistic is computed below.
31. 14-31 EXAMPLE (Continued) Solution (continued):
32. 14-32 EXAMPLE (Continued) Solution (continued):
33. 14-33 14-3 The Chi-Square test for Independence The chi-square independence test can be used to test for the independence between two variables.
34. 14-34 EXAMPLE Example: A survey was done by a car manufacturer concerning a particular make and model. A group of 500 potential customers were asked whether they purchased their current car because of its appearance, its performance rating, or its fixed price (no negotiating). The results, broken down by gender responses, are given on the next slide.
35. 14-35 EXAMPLE (Continued)
36. 14-36 EXAMPLE (Continued) One way of answering this question is to determine whether the criterion used in buying a car is independent of gender.
37. 14-37 EXAMPLE (Continued) That is, we can do a test for independence.
Thus the null hypothesis will be that the criterion used is independent of gender, while the alternative hypothesis will be that the criterion used is dependent on gender.
38. 14-38 Quick Tips When data are arranged in tabular form for the chi-square independence test, the table is called a contingency table.
Here the table on slide #35 has 2 rows and 3 columns, so we say we have a 2 by 3 (2?3) contingency table.
39. 14-39 Quick Tips The degrees of freedom for any contingency table is given by (number of rows – 1)?(number of columns – 1). In this example,
df = (2 – 1)?(3 – 1) = 2.
40. 14-40 EXAMPLE (Continued) In order to test for independence using the chi-square independence test, we must compute expected values under the assumption that the null hypothesis is true.
To find these expected values, we need to compute the row totals and the column totals.
41. 14-41 EXAMPLE (Continued) The table on the next slide shows the observed frequencies with the row and column totals.
These row and column are called marginal totals.
42. 14-42 EXAMPLE (Continued)
43. 14-43 EXAMPLE (Continued) Computation of the expected values: (example)-
The total for the first row (male) is 185, and the total for the first column (appearance) is 180. The expected value for the cell in the table where the first row (male) and first column (appearance) intersect will be (185?180)/500 = 66.6.
44. 14-44 EXAMPLE (Continued) The table on the next slide shows the expected frequencies with the marginal totals.
45. 14-45 EXAMPLE (Continued)
46. 14-46 EXAMPLE (Continued) Solution (continued): The ?2 test statistic is computed in the same manner as was done for the goodness-of-fit test.
47. 14-47 EXAMPLE (Continued) Solution (continued):
48. 14-48 EXAMPLE (Continued) Solution (continued): Diagram showing the rejection region.
49. 14-49 14-4 Benford’s Law Frank Benford, in the 1930s, noticed that logarithm tables (these were used by scientists long before the common use of computers and calculators) tended to be worn out on the early pages where the numbers started with the digit 1.
50. 14-50 14-4 Benford’s Law Based on this observation and many others, he discovered that more numbers in the real world started with the digit 1 rather than with 2, and that more started with the digit 2 rather than with 3, and so on.
He later published a formula which describes the proportion of times a number will begin with the digit 1, 2, 3, etc.
51. 14-51 14-4 Benford’s Law This formula is now called Benford’s Law.
The Table on the next slide shows the distribution of the proportions, to three decimal places, for the leading digits of numbers based on Benford’s Law.
52. 14-52 14-4 Benford’s Law
53. 14-53 14-4 Benford’s Law
54. 14-54 14-4 Benford’s Law Example: Students who attend college and apply for student loans must submit a FAFSA (Free Application for Federal Student Aid) form. Part of the information that is required is the annual income of the parent or parents. A sample of 3,633 forms was sampled from a college records and the proportion, to three decimal places, of the leading digits for the total annual income for the parents were recorded. This information is presented on the next slide.
55. 14-55 14-4 Benford’s Law Test at the 5 percent significance level whether the distribution of the first digits for the reported total salaries for the parents follow Benford’s Law.
56. 14-56 14-4 Benford’s Law Solution: Plots of the proportions of the leading digits for both Benford’s Law and the parents’ salaries are shown below.
57. 14-57 14-4 Benford’s Law Solution (continued): The Table on the next slide shows the computations needed to compute the ?2 test statistic.
The value of the test statistic is equal to 507.527.
To obtain the expected frequencies based on Benford’s Law one should multiply the total of 3,633 by Benford’s proportions.
For example, from the table, the expected frequency value of 639.408 is obtained from 3,633×0.176 = 639.408, etc.
58. 14-58 14-4 Benford’s Law
59. 14-59 14-4 Benford’s Law
60. 14-60 EXAMPLE (Continued) Solution (continued): Diagram showing the rejection region.