1 / 16

The  2 (chi-squared) test for independence

The  2 (chi-squared) test for independence. Joan Ridgway. A random sample of 200 teachers in higher education, secondary schools and primary schools gave the following numbers of men and women in each sector:.

aneko
Download Presentation

The  2 (chi-squared) test for independence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The 2 (chi-squared) test for independence Joan Ridgway

  2. A random sample of 200 teachers in higher education, secondary schools and primary schools gave the following numbers of men and women in each sector: We might want to find out whether or not there is an association between ‘age-group taught’ and ‘gender’. One way of finding out is to perform a 2 (chi-squared) test for independence. To set up the test: We first set up a null hypothesis, H0, and an alternative hypothesis, H1. H0 always states that the data sets are independent, and H1 always states that they are related. In this case, H0 could be “The age-group taught is independent of gender”. H1 could be “There is an association between age-group taught and gender.”

  3. We put the data into a table . The elements in the table are our observed data and the table is known as a contingency table.

  4. We put the data into a table . The elements in the table are our observed data and the table is known as a contingency table. 80 120 34 94 72 200

  5. row total x column total The expected frequency for each cell will be: total sample size • We put the data into tables. The elements in the table are our observed data and the table is known as a contingency table. From the observed data we can calculate the expected frequencies. 13.6 37.6

  6. row total x column total The expected frequency for each cell will be: total sample size In fact for this table we only need to actually work out two of the expected values, and the rest will follow from the totals. This gives us the degree of freedom for this table - it is 2 13.6 37.6 28.8 20.4 56.4 43.2

  7. In fact for this table we only need to actually work out two of the expected values, and the rest will follow from the totals. This tells us that the degree of freedom for this table is 2 You can always find the degree of freedom by going back to the original table (without the totals). Crossing off one column and one row, and the number of cells left is the degree of freedom. (No. of columns – 1) x (No. of rows – 1) df = 2 13.6 37.6 28.8 20.4 56.4 43.2

  8. 2calc 2calc Contingency Table – Observed Data Expected Frequencies Now we are ready to calculate the 2 value using the formula: fo is the observed value fe is the expected value Finally use the table of critical values at the back of your formulae booklet. If the2calcvalue is less than the critical value, we accept H0, the null hypothesis. If the2calcvalue is more than the critical value, we do not accept the null hypothesis, so we accept H1 In this case the 2calcvalue is 11.3, and the critical value at 5% is 5.991. So we do not accept H0, the null hypothesis. There is an association between age-group taught and gender.

  9. If the2calcvalue is less than the critical value, we do accept H0, the null hypothesis. If the2calcvalue is more than the critical value, we do not accept the null hypothesis, so we accept H1 If thep-value is less than the significance level, we do not accept H0, the null hypothesis. If thep-value is more than the significance level, we do accept the null hypothesis, so we accept H1

  10. Enter your data, pressing after every value. ENTER You will now see where your table of expected values will be ; change it if you wish. Otherwise scroll down to Calculate and Scroll up to find2 STAT [TESTS] ENTER ENTER You can do all this on the GDC: Enter the data into a Matrix MATRIX [EDIT] ENTER Enter the size of your matrix; in this case 2 x 3 (2 rows, 3 columns) 2is given to you. p is the probability dfis the degree of freedom To see the table of expected values: ENTER MATRIX Finally use the table of critical values at the back of your formulae booklet. If the2calcvalue is less than the critical value, we accept the null hypothesis. If the2calcvalue is more than the critical value, we do not accept the null hypothesis, so we accept H1

  11. Suppose we collect data on the favourite colour of car for men and women. We may want to find out whether favourite colour of car and gender are independent or related. One way of finding out is to perform a 2 (chi-squared) test for independence. To set up the test: We first set up a null hypothesis, H0, and an alternative hypothesis, H1. H0 always states that the data sets are independent, and H1 always states that they are related. In this case, H0 could be “The favourite colour of car is independent of gender”. H1 could be “There is an association between favourite colour of car and gender.”

  12. 130 96 58 55 51 260

  13. row total x column total The expected frequency for each cell will be: total sample size From the observed data we can calculate the expected frequencies. 48 29 27.5

  14. row total x column total The expected frequency for each cell will be: total sample size In fact for this table we only need to actually work out three of the expected values, and the rest will follow from the totals. This gives us the degree of freedom for this table - it is 3 25.5 48 29 27.5 25.5

  15. In fact for this table we only need to actually work out two of the expected values, and the rest will follow from the totals. This tells us that the degree of freedom for this table is 2 You can always find the degree of freedom by going back to the original table (without the totals). Crossing off one column and one row, and the number of cells left is the degree of freedom. (No. of columns – 1) x (No. of rows – 1) df = 3

  16. Contingency Table – Observed Data Expected Frequencies Now we are ready to calculate the 2 value using the formula: fo is the observed value fe is the expected value 2calc 2calc Finally use the table of critical values at the back of your formulae booklet. If the2calcvalue is less than the critical value, we accept H0, the null hypothesis. If the2calcvalue is more than the critical value, we do not accept the null hypothesis, so we accept H1 In this case the 2calcvalue is 6.13, and the critical value at 5% is 7.815. So we do accept H0, the null hypothesis. There is no association between favourite colour of car and gender.

More Related