360 likes | 479 Views
Tutorial: Chi-Square Distribution. Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics. 2. Purpose. To measure discontinuous categorical/binned data in which a number of subjects fall into categories
E N D
Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL 5081 - Biostatistics 2
Purpose • To measure discontinuous categorical/binned data in which a number of subjects fall into categories • We want to compare our observed data to what we expect to see. Due to chance? Due to association? • When can we use the Chi-Square Test? • Testing outcome of Mendelian Crosses, Testing Independence – Is one factor associated with another?, Testing a population for expected proportions
Assumptions: • 1 or more categories • Independent observations • A sample size of at least 10 • Random sampling • All observations must be used • For the test to be accurate, the expected frequency should be at least 5
Conducting Chi-Square Analysis • Make a hypothesis based on your basic biological question • Determine the expected frequencies • Create a table with observed frequencies, expected frequencies, and chi-square values using the formula: (O-E)2 E • Find the degrees of freedom: (c-1)(r-1) • Find the chi-square statistic in the Chi-Square Distribution table • If chi-square statistic > your calculated chi-square value, you do not reject your null hypothesis and vice versa.
Example 1: Testing for Proportions HO: Horned lizards eat equal amounts of leaf cutter, carpenter and black ants. HA: Horned lizards eat more amounts of one species of ants than the others. χ2 = Sum of all: (O-E)2 E Calculate degrees of freedom: (c-1)(r-1) = 3-1 = 2 Under a critical value of your choice (e.g. α= 0.05 or 95% confidence), look up Chi-square statistic on a Chi-square distribution table.
Example 1: Testing for Proportions χ2α=0.05 = 5.991
Example 1: Testing for Proportions Chi-square statistic: χ2 = 5.991 Our calculated value: χ2 = 1.90 *If chi-square statistic > your calculated value, then you do not reject your null hypothesis. There is a significant difference that is not due to chance. 5.991 > 1.90 ∴ We do not reject our null hypothesis.
SAS: Example 1 Included to format the table Define your data Indicate what your want in your output
SAS: What does the p-value mean? “The exact p-value for a nondirectional test is the sum of probabilities for the table having a test statistic greater than or equal to the value of the observed test statistic.” High p-value: High probability that test statistic > observed test statistic. Do not reject null hypothesis. Low p-value: Low probability that test statistic > observed test statistic. Rejectnull hypothesis.
SAS: Example 1 High probability that Chi-Square statistic > our calculated chi-square statistic. We do not reject our null hypothesis.
Example 2: Testing Association c HO: Gender and eye colour are not associated with each other. HA: Gender and eye colour are associated with each other. cellchi2 = displays how much each cell contributes to the overall chi-squared value no col = do not display totals of column no row = do not display totals of rows chi sq = display chi square statistics
Example 2: More SAS Examples High probability that Chi-Square statistic > our calculated chi-square statistic. (78.25%) We do not reject our null hypothesis. (2-1)(3-1) = 1*2 = 2
Example 2: More SAS Examples If there was an association, can check which interactions describe association by looking at how much each cell contributes to the overall Chi-square value.
Limitations • No categories should be less than 1 • No more than 1/5 of the expected categories should be less than 5 • To correct for this, can collect larger samples or combine your data for the smaller expected categories until their combined value is 5 or more • Yates Correction* • When there is only 1 degree of freedom, regular chi-test should not be used • Apply the Yates correction by subtracting 0.5 from the absolute value of each calculated O-E term, then continue as usual with the new corrected values
Mantel-Haenszel Chi-Square Test QMH = (n-1)r2 • r2 is the Pearson correlation coefficient (which also measures the linear association between row and column) • http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_freq_a0000000659.htm • Tests alternative hypothesis that there is a linear association between the row and column variable • Follows a Chi-square distribution with 1 degree of freedom
Yates & 2 x 2 Contingency Tables HO: Heart Disease is not associated with cholesterol levels. HA: Heart Disease is more likely in patients with a high cholesterol diet. Calculate degrees of freedom: (c-1)(r-1) = 1*1 = 1 We need to use the YATES CORRECTION
Yates & 2 x 2 Contingency Tables HO: Heart Disease is not associated with cholesterol levels. HA: Heart Disease is more likely in patients with a high cholesterol diet. (|15-12.65| - 0.5)2 12.65 = 0.27
Example 1: Testing for Proportions χ2α=0.05 = 3.841
Yates & 2 x 2 Contingency Tables HO: Heart Disease is not associated with cholesterol levels. HA: Heart Disease is more likely in patients with a high cholesterol diet. 3.841 > 1.42 ∴ We do not reject our null hypothesis.
Fisher’s Exact Test • Left: Use when the alternative to independence is negative association between the variables. These observations tend to lie in lower left and upper right cells of the table. Small p-value = Likely negative association. • Right: Use this one-sided test when the alternative to independence is positive association between the variables. These observations tend to lie in upper left and lower right cells or the table. Small p-value = Likely positive association. • Two-Tail: Use this when there is no prior alternative.
HO: Heart Disease is not associated with cholesterol levels. HA: Heart Disease is more likely in patients with a high cholesterol diet.
Conclusion • The Chi-square test is important in testing the association between variables and/or checking if one’s expected proportions meet the reality of one’s experiment • There are multiple chi-square tests, each catered to a specific sample size, degrees of freedom, and number of categories • We can use SAS to conduct Chi-square tests on our data by utilizing the command proc freq
References Chi-Square Test Descriptions: http://www.enviroliteracy.org/pdf/materials/1210.pdf http://129.123.92.202/biol1020/Statistics/Appendix%206%20%20The%20Chi-Square%20TEst.pdf Ozdemir T and Eyduran E. 2005. Comparison of chi-square and likelihood ratio chi-square tests: power of test. Journal of Applied Sciences Research. 1(2):242-244. SAS Support website: http://www.sas.com/index.html “FREQ procedure” YouTube Chi-square SAS Tutorial (user: mbate001): http://www.youtube.com/watch?v=ACbQ8FJTq7k