1 / 34

Tutorial: Chi-Square Distribution

Tutorial: Chi-Square Distribution. Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics. 2. Purpose. To measure discontinuous categorical/binned data in which a number of subjects fall into categories

Download Presentation

Tutorial: Chi-Square Distribution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics 2

  2. Purpose • To measure discontinuous categorical/binned data in which a number of subjects fall into categories • We want to compare our observed data to what we expect to see. Due to chance? Due to association? • When can we use the Chi-Square Test? • Testing outcome of Mendelian Crosses, Testing Independence – Is one factor associated with another?, Testing a population for expected proportions

  3. Assumptions: • 1 or more categories • Independent observations • A sample size of at least 10 • Random sampling • All observations must be used • For the test to be accurate, the expected frequency should be at least 5

  4. Conducting Chi-Square Analysis • Make a hypothesis based on your basic biological question • Determine the expected frequencies • Create a table with observed frequencies, expected frequencies, and chi-square values using the formula: (O-E)2 E • Find the degrees of freedom: (c-1)(r-1) • Find the chi-square statistic in the Chi-Square Distribution table • If chi-square statistic > your calculated chi-square value, you do not reject your null hypothesis and vice versa.

  5. Example 1: Testing for Proportions HO: Horned lizards eat equal amounts of leaf cutter, carpenter and black ants. HA: Horned lizards eat more amounts of one species of ants than the others. χ2 = Sum of all: (O-E)2 E Calculate degrees of freedom: (c-1)(r-1) = 3-1 = 2 Under a critical value of your choice (e.g. α= 0.05 or 95% confidence), look up Chi-square statistic on a Chi-square distribution table.

  6. Example 1: Testing for Proportions χ2α=0.05 = 5.991

  7. Example 1: Testing for Proportions Chi-square statistic: χ2 = 5.991 Our calculated value: χ2 = 1.90 *If chi-square statistic > your calculated value, then you do not reject your null hypothesis. There is a significant difference that is not due to chance. 5.991 > 1.90 ∴ We do not reject our null hypothesis.

  8. SAS: Example 1 Included to format the table Define your data Indicate what your want in your output

  9. SAS: Example 1

  10. SAS: What does the p-value mean? “The exact p-value for a nondirectional test is the sum of probabilities for the table having a test statistic greater than or equal to the value of the observed test statistic.” High p-value: High probability that test statistic > observed test statistic. Do not reject null hypothesis. Low p-value: Low probability that test statistic > observed test statistic. Rejectnull hypothesis.

  11. SAS: Example 1 High probability that Chi-Square statistic > our calculated chi-square statistic. We do not reject our null hypothesis.

  12. SAS: Example 1

  13. Example 2: Testing Association c HO: Gender and eye colour are not associated with each other. HA: Gender and eye colour are associated with each other. cellchi2 = displays how much each cell contributes to the overall chi-squared value no col = do not display totals of column no row = do not display totals of rows chi sq = display chi square statistics

  14. Example 2: More SAS Examples

  15. Example 2: More SAS Examples High probability that Chi-Square statistic > our calculated chi-square statistic. (78.25%) We do not reject our null hypothesis. (2-1)(3-1) = 1*2 = 2

  16. Example 2: More SAS Examples If there was an association, can check which interactions describe association by looking at how much each cell contributes to the overall Chi-square value.

  17. Limitations • No categories should be less than 1 • No more than 1/5 of the expected categories should be less than 5 • To correct for this, can collect larger samples or combine your data for the smaller expected categories until their combined value is 5 or more • Yates Correction* • When there is only 1 degree of freedom, regular chi-test should not be used • Apply the Yates correction by subtracting 0.5 from the absolute value of each calculated O-E term, then continue as usual with the new corrected values

  18. What do these mean?

  19. Likelihood Ratio Chi Square

  20. Continuity-Adjusted Chi-Square Test

  21. Mantel-Haenszel Chi-Square Test QMH = (n-1)r2 • r2 is the Pearson correlation coefficient (which also measures the linear association between row and column) • http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_freq_a0000000659.htm • Tests alternative hypothesis that there is a linear association between the row and column variable • Follows a Chi-square distribution with 1 degree of freedom

  22. Phi Coefficient

  23. Contigency Coefficient

  24. Cramer’s V

  25. Yates & 2 x 2 Contingency Tables HO: Heart Disease is not associated with cholesterol levels. HA: Heart Disease is more likely in patients with a high cholesterol diet. Calculate degrees of freedom: (c-1)(r-1) = 1*1 = 1 We need to use the YATES CORRECTION

  26. Yates & 2 x 2 Contingency Tables HO: Heart Disease is not associated with cholesterol levels. HA: Heart Disease is more likely in patients with a high cholesterol diet. (|15-12.65| - 0.5)2 12.65 = 0.27

  27. Example 1: Testing for Proportions χ2α=0.05 = 3.841

  28. Yates & 2 x 2 Contingency Tables HO: Heart Disease is not associated with cholesterol levels. HA: Heart Disease is more likely in patients with a high cholesterol diet. 3.841 > 1.42 ∴ We do not reject our null hypothesis.

  29. Fisher’s Exact Test • Left: Use when the alternative to independence is negative association between the variables. These observations tend to lie in lower left and upper right cells of the table. Small p-value = Likely negative association. • Right: Use this one-sided test when the alternative to independence is positive association between the variables. These observations tend to lie in upper left and lower right cells or the table. Small p-value = Likely positive association. • Two-Tail: Use this when there is no prior alternative.

  30. Yates & 2 x 2 Contingency Tables

  31. Yates & 2 x 2 Contingency Tables

  32. HO: Heart Disease is not associated with cholesterol levels. HA: Heart Disease is more likely in patients with a high cholesterol diet.

  33. Conclusion • The Chi-square test is important in testing the association between variables and/or checking if one’s expected proportions meet the reality of one’s experiment • There are multiple chi-square tests, each catered to a specific sample size, degrees of freedom, and number of categories • We can use SAS to conduct Chi-square tests on our data by utilizing the command proc freq

  34. References Chi-Square Test Descriptions: http://www.enviroliteracy.org/pdf/materials/1210.pdf http://129.123.92.202/biol1020/Statistics/Appendix%206%20%20The%20Chi-Square%20TEst.pdf Ozdemir T and Eyduran E. 2005. Comparison of chi-square and likelihood ratio chi-square tests: power of test. Journal of Applied Sciences Research. 1(2):242-244. SAS Support website: http://www.sas.com/index.html “FREQ procedure” YouTube Chi-square SAS Tutorial (user: mbate001): http://www.youtube.com/watch?v=ACbQ8FJTq7k

More Related