180 likes | 201 Views
Learn about expected frequencies, calculating Chi-square, significance testing for contingency tables, and when to use Chi-squared tests in sociology. Explore the relationship between support for tax reform and environmental issues. Understand how to test independence of attitudes using Chi-squared tests and interpreting results.
E N D
Sociology 601 Class12: October 8, 2009 • The Chi-Squared Test (8.2) • expected frequencies • calculating Chi-square • finding p • When (not) to use Chi-squared tests (8.3) • chi-squared residuals
8.2 Chi-squared statistical significance test for contingency tables. support tax reform? Yes No Tot support Yes 150 100 250 environment? No 200 50 250 Tot 350 150 500 • “Is the level of support for the environment independent of the level of support for tax reform?” • If so, these two measures may have some causal link worth investigating. • Q: which causes which?
2x2 table: a t-test for proportions With a 2x2 table, we can use a t-test for independent-sample proportions (review 7.2). . prtesti 250 .6 250 .8 Two-sample test of proportion x: Number of obs = 250 y: Number of obs = 250 ------------------------------------------------------------------------------ Variable | Mean Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | .6 .0309839 .5392727 .6607273 y | .8 .0252982 .7504164 .8495836 -------------+---------------------------------------------------------------- diff | -.2 .04 -.2783986 -.1216014 | under Ho: .0409878 -4.88 0.000 ------------------------------------------------------------------------------ diff = prop(x) - prop(y) z = -4.8795 Ho: diff = 0 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(Z < z) = 0.0000 Pr(|Z| < |z|) = 0.0000 Pr(Z > z) = 1.0000
Moving beyond 2x2 tables: • Comparing conditional probabilities is fine when there are only two comparisons and two possible outcomes for each comparison. • The Chi-Square (2) test is a new technique for making comparisons more flexible. • 2is like a null hypothesis that every cell should have the frequency you would expect if the variables were independently distributed. • fe is the expected count for each cell. • fe = product of row totals * column totals / table total (A&F 254) • fe = total N * unconditional row probability * unconditional column probability • fe = column N * unconditional row • A test for the whole table will combine tests for fe for every cell.
Calculating expected cell counts: • The expected cell count is the count we would expect in a cell if • environmental support among tax reform advocates and among tax reform opponents were identical, or if • environmental support among tax reform advocates were the same as environmental support among the whole sample, or if • tax reform support among environmentalists were the same as among non-enviornmentalists • 50% of sample supports environmental spending, so • fe(1,1) = .5 * 350 = 175 • fe(1,1) = 250 * 350 / 500 (A&F) • fe(1,1) = 500*(350/500)taxes *(250/500)environment = 175 • fe(1,2) = 75 • fe(2,1) = 175 • fe(2,2) = 75
Testing independence of support for tax reform and environmental spending: • New Approach: Chi Squared test for independence of attitudes toward taxes and the environment. • Test statistic: • 2 = ((fo – fe)2 / fe ) • where fo is the observed count in each cell • and where fe is the expected count for each cell, assuming that attitudes toward taxes will be the same for people who support environmental issues as for people who do not support environmental issues.
Assumptions and hypothesis for a chi-squared test: • Assumptions: • two categorical variables (for this course) • random sample or stratified random sample • fe 5 for all cells • Hypothesis: Ho: the two variables are statistically independent. • this means that the distribution of each variable is independent of the score of the other variable
Using expected cell counts to calculate a Chi-squared test statistic • The test statistic is analogous to a t-statistic… • but the form of the equation makes it difficult to see that the X2 statistic is a difference between the observed and expected values, divided by an estimate of the typical variation we would expect from random sampling error. • Test statistic: • 2 = ((fo – fe)2 / fe ) = ((150 –175)2/175 + (100-75)2/75 + (200-175)2/175 + (50-75)2/75 ) = 3.5714 + 8.3333 + 3.5714 + 8.3333 = 23.81
Degrees of freedom for a Chi-squared statistic: • We now have a test statistic: 2= 23.81 • How do we assign a p-value to this? • Step 1: calculate the degrees of freedom. • Given the row and column marginal totals, how many cells need we fill in before we can do the rest automatically? • Answer: 1 in this case, so df = 1. • General answer: df = (r-1)*(c-1), where r is the number of rows and c is the number of columns.
p-value for a Chi-squared statistic: • Assign a p-value to the statistic: 2= 23.81, df = 1 • Given the degrees of freedom, look up the p-value. • Go to Table C on page 670. • Go down to the row for df = 1 • Move across X2 values to the largest tabled value that is smaller than the measured X2 • Look up the corresponding p-value at the top of the column: p < .001 • The chi-squared test is always a 1-tailed test: we always use the right tail of the distribution.
Do your own chi-squared test: You watch 50 beachcombers to see if they are wearing sandals and if they are wearing shorts . wearing shorts? Yes No Tot sandals? Yes 20 10 30 . No 10 10 20 . Tot 30 20 50 Q: Does a beachcomber’s chance of wearing sandals depend on their chance of wearing shorts?
Chi-Squared Tests for tables larger than 2X2 • Here is a command to run a chi-squared test on the gender and partyid data from the 1996 GSS (cf. 8.1) . tab partyid3 sex, chi2 | respondents sex partyid3 | male female | Total ------------+----------------------+---------- Democrat | 350 627 | 977 Independent | 514 557 | 1,071 Republican | 400 407 | 807 ------------+----------------------+---------- Total | 1,264 1,591 | 2,855 Pearson chi2(2) = 43.4391 Pr = 0.000
Add expected cell counts . tab partyid3 sex, chi2 exp +--------------------+ | Key | |--------------------| | frequency | | expected frequency | +--------------------+ | respondents sex partyid3 | male female | Total ------------+----------------------+---------- Democrat | 350 627 | 977 | 432.5 544.5 | 977.0 ------------+----------------------+---------- Independent | 514 557 | 1,071 | 474.2 596.8 | 1,071.0 ------------+----------------------+---------- Republican | 400 407 | 807 | 357.3 449.7 | 807.0 ------------+----------------------+---------- Total | 1,264 1,591 | 2,855 | 1,264.0 1,591.0 | 2,855.0 Pearson chi2(2) = 43.4391 Pr = 0.000
8.3 When not to do a chi-squared test 1.) Do not do a Chi-squared test when the expected value of a cell is less than 5. The Problem: The total 2 is 6.28, so p<.05, but 4.5 of the total comes from one cell with fe = 2. (It is okay to do a Chi-squared test if a cell has an expected value above 5 and an observed value below 5!)
A small sample alternative to a chi-squared test When the sample size is too small for a chi-squared test, you may treat the contingency table as a small sample comparison of two population proportions. This means you should do a Fisher’s exact test for population proportions. A Fisher’s exact test will also work okay on large samples, but you sometimes will bog down the computer with lengthy computations. (This is especially likely to happen when the tables are 5X4 or larger).
Fisher’s exact test in STATA (not necessary in this case because of large N). . tab partyid3 sex, chi exact Enumerating sample-space combinations: stage 3: enumerations = 1 stage 2: enumerations = 158 stage 1: enumerations = 0 | respondents sex partyid3 | male female | Total ------------+----------------------+---------- Democrat | 350 627 | 977 Independent | 514 557 | 1,071 Republican | 400 407 | 807 ------------+----------------------+---------- Total | 1,264 1,591 | 2,855 Pearson chi2(2) = 43.4391 Pr = 0.000 Fisher's exact = 0.000
When not do a chi-squared test (#2) 2.) Do not do a Chi-squared test for cell values that are not observed frequencies. The Problem: If you use percentages, you misstate the sample size as 100.
When not to do a chi-squared test (#3) 3.) Do not do a Chi-squared test to find a difference in population proportions for dependent samples. The Problem: You want to know if the speech changed people’s opinions. A 2 test would tell you if opinions after the speech depend on opinions before the speech.