390 likes | 545 Views
Chapter 11. Chi-Square and F Distributions. Table of Contents. 11.1 Chi-Square: Tests of Independence and of Homogeneity. 11.2 Chi-Square: Goodness of Fit. 11.3 Testing and Estimating a Single Variance or Standard Deviation.
E N D
Chapter 11 Chi-Square and F Distributions
Table of Contents 11.1 Chi-Square: Tests of Independence and of Homogeneity 11.2 Chi-Square: Goodness of Fit 11.3 Testing and Estimating a Single Variance or Standard Deviation
11.1 Chi-Square: Tests of Independence and of Homogeneity With a χ² test for independence, we will always be investigating whether two variables A and B covering the same population are independent. Our null and alternate hypotheses will always be of the form H0: A and B are independent H1: A and B are not independent Also, variables A and B will always either be categorical data, or data that can be arranged into categories.
Contingency table Size = Rows × Columns = 3 × 3 11.1 Chi-Square: Tests of Independence and of Homogeneity
This contingency table contains the observed count by category. To test whether the two variables (keyboard arrangement and learning time) are independent of each other, we will need to generate an expected contingency table for comparison. 11.1 Chi-Square: Tests of Independence and of Homogeneity
(Row total)(Column total) E = Sample size To calculate (by hand) each value in the expected contingency table, we use the following formula: If you will be using the TI-83/84 calculator, then the expected contingency table will be generated automatically for you. 11.1 Chi-Square: Tests of Independence and of Homogeneity
Calculator function The TI-83/84 calculator has a function to perform a χ² test of independence. STAT|TESTS|χ²-Test… Before using this function, we have to enter our observed contingency table into a matrix. To do this, use MATRX|EDIT|… and then select a matrix to edit. [2nd][x–1] 11.1 Chi-Square: Tests of Independence and of Homogeneity
Guided Exercise 2 Let’s enter our observed contingency table into a matrix, let’s say matrix A. [ENTER] MATRX|EDIT|[A] The first set of data that the calculator asks for is the matrix size. Since the contingency table is 3 × 3, we enter: [3] [ENTER] [3] [ENTER] We now have a 3 × 3 matrix into which we can enter our observed contingency table. 11.1 Chi-Square: Tests of Independence and of Homogeneity
Guided Exercise 2 When entering data into a matrix, the calculator will only allow you to go left-to-right, completing each row before going on to the next. The completed matrix should look like MATRIX[A] 3 ×3 [25 30 25 ] [30 71 19 ] [35 49 16 ] 11.1 Chi-Square: Tests of Independence and of Homogeneity
Guided Exercise 2 Now let’s run STAT|TESTS|χ²-Test… χ²-Test Observed:[A] Expected:[B] Calculate Since our observed contingency table is in matrix A, we will select it by entering MATRX|NAMES|[A]. 11.1 Chi-Square: Tests of Independence and of Homogeneity
Guided Exercise 2 Now let’s run STAT|TESTS|χ²-Test… χ²-Test Observed:[A] Expected:[B] Calculate χ²-Test χ²=13.31583333 p=.0098313728 df=4 The calculator will generate the expected matrix, so it doesn’t matter which matrix we give here. If there is any information in this matrix, it will be written over. We will return to this output momentarily. For now, let’s concentrate on matrix B. 11.1 Chi-Square: Tests of Independence and of Homogeneity
Guided Exercise 2 Matrix B can be viewed by going to MATRX|EDIT|[B]. This matrix should look like MATRIX[B] 3 ×3 [24 40 16 ] [36 60 24 ] [30 50 20 ] As you can see in the text, this is the calculated expected matrix. 11.1 Chi-Square: Tests of Independence and of Homogeneity
Guided Exercise 2 Q: What does this expected contingency table represent? A: It represents the situation where keyboard arrangement and learning times are perfectly independent. 11.1 Chi-Square: Tests of Independence and of Homogeneity
Guided Exercise 2 Q: What exactly does perfectly independent mean? A: It means that the probability of one event from one variable does not change when given any event from the other variable. 11.1 Chi-Square: Tests of Independence and of Homogeneity
Guided Exercise 2 E.g., P(21–40 h) = 90 / 300 = 0.30 P(21–40 h given A) = 24 / 80 = 0.30 P(21–40 h given B) = 36 / 120 = 0.30 P(21–40 h given Std) = 30 / 100 = 0.30 11.1 Chi-Square: Tests of Independence and of Homogeneity
Guided Exercise 2 We could carry through this exercise to show that every time category is independent of every keyboard arrangement, and vice versa. The expected contingency table is then the hypothetical table that represents perfect independence between two variables. 11.1 Chi-Square: Tests of Independence and of Homogeneity
The χ²-Test function returned more than just the expected contingency table. It also told us that χ² ≈ 13.316, which is shown in Guided Exercise 3 using the formula given on page 580. On page 581, we are told that the degrees of freedom used in the χ² distribution is defined as (Rows – 1)(Columns – 1), in this case (3 – 1)(3 – 1) = 4. This was returned by the χ²-Test function, as well as confirmed in Guided Exercise 4. When testing for independence, we always use the χ² distribution as a right-tailed test. This is illustrated in Figure 11-3 on page 582. 11.1 Chi-Square: Tests of Independence and of Homogeneity
The final and most crucial value returned by the χ²-Test function is the P-value, which tells us whether to reject H0 when compared with α. In this case, we are given α = 0.05 in the first sentence after Guided Exercise 4. The calculator returned a P-value of 0.0098. Since P < α, we reject H0 and choose instead that keyboard arrangement and learning time are not independent. In other words, different keyboard arrangements appear to produce different learning time distributions. Had we not rejected H0, then we would have concluded that learning time distributions are the same for each keyboard arrangement. 11.1 Chi-Square: Tests of Independence and of Homogeneity
Test of homogeneity With a χ² test of homogeneity, we will always be investigating whether different populations (variable A) proportionally show the same distribution across a variable (variable B). Our null and alternate hypotheses will always be of the form H0: The populations have the same proportions of respective characteristics H1: The populations have different proportions of respective characteristics
Note Although the tests for homogeneity and independence have different hypothesis setups, the computations are the same. 11.1 Chi-Square: Tests of Independence and of Homogeneity
Example 2 First, enter the contingency table into a matrix, let’s say matrix [C]. Next run STAT|TESTS|χ²-Test… χ²-Test Observed:[C] Expected:[D] Calculate χ²-Test χ²=16.05913006 p=.0011027654 df=3 Since P = 0.0011, which is less than α = 0.01, we reject the null hypothesis (that pet preference for males and females is proportionally the same), and instead conclude that pet preference is proportionally different for males and females.
Note The Multinomial Experiments section is not part of this course. 11.1 Chi-Square: Tests of Independence and of Homogeneity
11.2 Chi-Square: Goodness of Fit With goodness of fit tests, the null and alternate hypotheses are always of the form H0: The population fits the given distribution H1: The population does not fit the given distribution
Nuts & bolts Just as with the test of independence, the goodness of fit test has observed categorical data. Just as with the test of independence, we will be comparing the observed data distribution against an expected distribution. Unlike the test of independence, we will be given an expected relative frequency distribution rather than calculating an independent contingency table. Unlike the test of independence, there is no direct function to carry out all our computations for us. Nonetheless, the calculator method is still pretty simple. 11.2 Chi-Square: Goodness of Fit
Nuts & bolts First, enter the observed frequency distribution into a list. For brevity, let’s call this OList. Next, find the sample size n for the observed data. Using this, enter the expected relative frequency times n into another list, let’s call it EList. Outside of the list editor, run sum((OList–EList)²/EList). The value returned is the χ² value for the test. (Note: from Chapter 10, you can find the sum() function through LIST|MATH|sum). 11.2 Chi-Square: Goodness of Fit
Nuts & bolts Then run χ²cdf(χ²,1E99,k–1) where χ² is the value calculated in the previous step, and k is the number of categories. The value returned is the P value for the test. You can access the χ²cdf() function through DISTR|DISTR|χ²cdf. 11.2 Chi-Square: Goodness of Fit
As an example, let’s enter the observed data from Table 11-9 on page 593 into L1. This is our OList. Next, let’s enter our expected values into L2. Instead of entering each conveniently calculated value, let’s enter them each as relative frequency times n. For example, the first entry in L2 would be 0.04*500. Now let’s run sum((OList–EList)²/EList), which in this case is sum((L1–L2)²/L2). This returns a χ² value of 14.1538. Since we have 5 categories, we now run χ²cdf(14.1538,1E99,5–1), which returns a P value of 0.0068. 11.2 Chi-Square: Goodness of Fit
The book does not establish the level of significance α = 0.01 until the fourth full paragraph on page 594. We can see from this that P < α, so we reject the null hypothesis that the observed present population has the same distribution as last year, and instead conclude that the present population has a different distribution. 11.2 Chi-Square: Goodness of Fit
(n – 1)s² χ² = σ² 11.3 Testing and Estimating a Single Variance or Standard Deviation Theorem 11.1 If we have a normal population with variance σ² and a random sample of n measurements is taken from this population with sample variance s², then has a chi-square distribution with degrees of freedom d.f. = n – 1.
Note The χ² table on page A25 gives χ² values corresponding to right-sided areas. In order to find a χ² value corresponding to a left-sided area, look for 1 – right-sided area. 11.3 Testing and Estimating a Single Variance or Standard Deviation
We will use this knowledge of obtaining left-sided χ² values later in this section. For now, we will be testing variances according to the following method. Method for testing σ² 1. Calculate χ² using the formula from theorem 11.1. 2. Calculate the P value using: For a right-tailed test: χ²cdf(χ²,1E99,n–1) For a left-tailed test: χ²cdf(0,χ²,n–1) 11.3 Testing and Estimating a Single Variance or Standard Deviation
Example 4 … the manager knows that the standard deviation of waiting times is 7 minutes. The old standard deviation is σ = 7, so the old variance is σ² = 49. H0: σ² = 49 H1: σ² < 49 11.3 Testing and Estimating a Single Variance or Standard Deviation
Example 4 H0: σ² = 49 H1: σ² < 49 1. Calculate χ² using the formula from theorem 11.1. (25–1)*5²/49 12.24489796 2. Calculate the P value using: For a right-tailed test: χ²cdf(χ²,1E99,n–1) For a left-tailed test: χ²cdf(0,χ²,n–1) χ²cdf(0,12.24,25–1) 0.0229331185 Since P < α (0.0229 < 0.05), we reject H0 and choose H1: σ² < 49. 11.3 Testing and Estimating a Single Variance or Standard Deviation
Note Sometimes problems will ask you to test the standard deviation instead of asking you to test the variance. Since the test is set up for testing variances, you will always test for variance. In cases where you are asked to test a standard deviation, you will convert the problem to test the corresponding variance. 11.3 Testing and Estimating a Single Variance or Standard Deviation
Note We will not be performing two-tailed tests for variance in this course. 11.3 Testing and Estimating a Single Variance or Standard Deviation
How to find a confidence interval for σ² and σ Let x be a random variable with a normal distribution and an unknown population standard deviation σ. Take a random sample of size n from the x distribution and compute the sample standard deviation s. 11.3 Testing and Estimating a Single Variance or Standard Deviation
√ √ (n – 1)s² (n – 1)s² (n – 1)s² (n – 1)s² < σ < < σ² < χ² χ² χ² χ² U L U L How to find a confidence interval for σ² and σ Then a confidence interval for the population variance σ² is and a confidence interval for the population standard deviation σ is 11.3 Testing and Estimating a Single Variance or Standard Deviation
χ² = chi-square value from Table 7 of Appendix II using d.f. = n – 1 and right-tail area = (1 – c)/2 χ² = chi-square value from Table 7 of Appendix II using d.f. = n – 1 and right-tail area = (1 + c)/2 U L How to find a confidence interval for σ² and σ where c = confidence level (0 < c < 1) n = sample size (n ≥ 2) 11.3 Testing and Estimating a Single Variance or Standard Deviation
Note When inserting the χ² values into the confidence interval formulas, do not square them. The χ² values are themselves already squared values. 11.3 Testing and Estimating a Single Variance or Standard Deviation