Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions

Announcements • Final project proposals due Nov 15 • Get started now!!! • Find a dataset • figure out what hypotheses you might test • Today: Wrap up Crosstabs • If time remains, we’ll discuss project ideas…

Review: Chi-square Test • Chi-Square test is a test of independence • Null hypothesis: the two categorical variables are statistically independent • There is no relationship between them • H0: Gender and political party are independent • Alternate hypothesis: the variables are related, not independent of each other • H1: Gender and political party are not independent • Test is based on comparing the observed cell values with the values you’d expect if there were no relationship between variables.

Review: Expected Cell Values • If two variables are independent, cell values will depend only on row & column marginals • Marginals reflect frequencies… And, if frequency is high, all cells in that row (or column) should be high • The formula for the expected value in a cell is: • fi and fj are the row and column marginals • N is the total sample size

Review: Chi-square Test • The Chi-square formula: • Where: • R = total number of rows in the table • C = total number of columns in the table • Eij = the expected frequency in row i, column j • Oij = the observed frequency in row i, column j • Assumption for test: Large N (>100) • Critical value DofF: (R-1)(C-1).

Chi-square Test of Independence • Example: Gender and Political Views • Let’s pretend that N of 68 is sufficient

Chi-square Test of Independence • Compute (E – O)2 /E for each cell

Chi-Square Test of Independence • Finally, sum up to compute the Chi-square • c2 = .55 + .95 + .66 + .86 = 3.02 • What is the critical value for a=.05? • Degrees of freedom: (R-1)(C-1) = (2-1)(2-1) = 1 • According to Knoke, p. 509: Critical value is 3.84 • Question: Can we reject H0? • No. c2 of 3.02 is less than the critical value • We cannot conclude that there is a relationship between gender and political party affiliation.

Chi-square Test of Independence • Weaknesses of chi-square tests: • 1. If the sample is very large, we almost always reject H0. • Even tiny covariations are statistically significant • But, they may not be socially meaningful differences • 2. It doesn’t tell us how strong the relationship is • It doesn’t tell us if it is a large, meaningful difference or a very small one • It is only a test of “independence” vs. “dependence” • Measures of Association address this shortcoming.

Measures of Association • Separate from the issue of independence, statisticians have created measures of association • They are measures that tell us how strong the relationship is between two variables • Weak Association Strong Association

Crosstab Association:Yule’s Q • #1: Yule’s Q • Appropriate only for 2x2 tables (2 rows, 2 columns) • Label cell frequencies a through d: • Recall that extreme values along the “diagonal” (cells a & d) or the “off-diagonal” (b & c) indicate a strong relationship. • Yule’s Q captures that in a measure • 0 = no association. -1, +1 = strong association

Crosstab Association:Yule’s Q • Rule of Thumb for interpreting Yule’s Q: • Bohrnstedt & Knoke, p. 150

Crosstab Association:Yule’s Q Calculate “bc” bc = (10)(16) = 160 Calculate “ad” ad = (27)(15) = 405 • Example: Gender and Political Party Affiliation • a b • c d • -.48 = “weak association”, almost “moderate”

Association: Other Measures • Phi () • Very similar to Yule’s Q • Only for 2x2 tables, ranges from –1 to 1, 0 = no assoc. • Gamma (G) • Based on a very different method of calculation • Not limited to 2x2 tables • Requires ordered variables • Tau c (tc) and Somer’s d (dyx) • Same basic principle as Gamma • Several Others discussed in Knoke, Norusis.

Crosstab Association: Gamma • Gamma, like Q, is based on comparing “diagonal” to “off-diagonal” cases. • But, it does so differently • Jargon: • Concordant pairs: Pairs of cases where one case is higher on both variables than another case • Discordant pairs: Pairs of cases for which the first case (when compared to a second) is higher on one variable but lower on another

Crosstab Association: Gamma All 71 individuals can be a pair with everyone in the lower cells. Just Multiply! (71)(659+1498+ 431+467) = 216,905 conc. pairs • Example: Approval of candidates • Cases in “Love Trees/Love Guns” cell make concordant pairs with cases lower on both

Crosstab Association: Gamma These 603 can pair with all those that score lower on approval for Guns & Trees (603)(659 + 431) = 657,270 conc. pairs These can pair lower too! (452)(431 + 467) = 405,896 conc. pairs • More possible concordant pairs • The “Love Guns/Trees are OK” cell and the “Trees = OK/Love Guns” cells also can have concordant pairs

Crosstab Association: Gamma The top-left cell is higher on Guns but lower on Trees than those in the lower right. They make pairs: (1205)(1498 + 452 + 467 + 1120) = 4,262,085 discordant pairs • Discordant pairs: Pairs where a first person ranks higher on one dimension (e.g. approval of Trees) but lower on the other (e.g., app. of Guns)

Crosstab Associaton: Gamma • If all pairs are concordant or all pairs are discordant, the variables are strongly related • If there are an equal number of discordant and concordant pairs, the variables are weakly associated. • Formula for Gamma: • ns = number of concordant pairs • nd = number of discordant pairs

Crosstab Association: Gamma • Calculation of Gamma is typically done by computer • Zero indicates no association • +1 = strong positive association • -1 = strong negative association • It is possible to do hypothesis tests on Gamma • To determine if population gamma differs from zero • Requirements: random sample, N > 50 • See Knoke, p. 155-6.

Crosstab Association • Final remarks: • You have a variety of possible measures to assess association among variables. Which one should you use? • Yule’s Q and Phi require a 2x2 table • Larger ordered tables: use Gamma, Tau-c, Somer’s d • Ideally, report more than one to show that your findings are robust.

Odds Ratios • Odds ratios are a powerful way of analyzing relationships in crosstabs • Many advanced categorical data analysis techniques are based on odds ratios • Review: What is a probability? • p(A) = # of outcomes that are “A” divided by total number of outcomes • To convert a frequency distribution to a probability distribution, simply divide frequency by N • The same can be done with crosstabs: Cell frequency over N is probability.

Odds Ratios • If total N = 68, probability of drawing cases is:

Odds Ratios • Odds are similar to probability… but not quite • Odds of A = Number of outcomes that are A, divided by number of outcomes that are not A • Note: Denominator is different that probability • Ex: Probability of rolling 1 on a 6-sided die = 1/6 • Odds of rolling a 1 on a six-sided die = 1/5 • Odds can also be calculated from probabilities:

Odds Ratios Conditional odds of being democrat are: 27 / 16 = 1.69 Note: Odds for women are different than men • Conditional odds = odds of being in one category of a variable within a specific category of another variable • Example: For women, what are the odds of being democrat? • Instead of overall odds of being democrat, conditional odds are about a particular subgroup in a table

Odds Ratios • If variables in a crosstab are independent, their conditional odds are equal • Odds of falling into one category or another are same for all values of other variable • If variables in a crosstab are associated, conditional odds differ • Odds can be compared by making a ratio • Ratio is equal to 1 if odds are the same for two groups • Ratios much greater or less than 1 indicate very different odds.

Odds Ratios • Formula for Odds Ratio in 2x2 table: a b c d • Ex: OR = (10)(16)/(27)(15) = 160 / 405 = .395 • Interpretation: men have .395 times the odds of being a democrat compared to women • Inverted value (1/.395=2.5) indicates odds of women being democrat = 2.5 is times men’s odds

Odds Ratios: Final Remarks • 1. Cells with zeros cause problems for odds ratios • Ratios with zero in denominator are undefined. • Thus, you need to have full cells • 2. Odds ratios can be used to measure assocation • Indeed, Yule’s Q is based on them • 3. Odds ratios form the basis for most advanced categorical data analysis techniques • For now it may be easier to use Yule’s Q, etc. But, if you need to do advanced techniques, you will use odds ratios.

Tests for Difference in Proportions • Another approach to small (2x2) tables: • Instead of making a crosstab, you can just think about the proportion of people in a given category • More similar to T-test than a Chi-square test • Ex: Do you approve of Pres. Bush? (Yes/No) • Sample: N = 86 women, 80 men • Proportion of women that approve: PW = .70 • Proportion of men that approve: PM = .78 • Issue: Do the populations of men/women differ? • Or are the differences just due to sampling variability

Tests for Difference in Proportions • Hypotheses: • Again, the typical null hypothesis is that there are no differences between groups • Which is equivalent to statistical independence • H0: Proportion women = proportion men • H1: Proportion women not = proportion men • Note: One-tailed directional hypotheses can also be used.

Tests for Difference in Proportions • Strategy: Figure out the sampling distribution for differences in proportions • Statisticians have determined relevant info: • 1. If samples are “large”, the sampling distribution of difference in proportions is normal • The Z-distribution can be used for hypothesis tests • 2. A Z-value can be calculated using the formula:

Tests for Difference in Proportions • Standard error can be estimated as: • Where:

Difference in Proportions: Example • Q: Do you approve of Pres. Bush? (Yes/No) • Sample: N = 86 women, 80 men • Women: N = 86, PW = .70 • Men: N = 80, PW = .78 • Total N is “Large”: 166 people • So, we can use a Z-test • Use a = .05, two-tailed Z = 1.96

Difference in Proportions: Example • Use formula to calculate Z-value • And, estimate the Standard Error as:

Difference in Proportions: Example • First: Calculate Pboth:

Difference in Proportions: Example • Plug in Pboth=.739:

Difference in Proportions: Example • Finally, plug in S.E. and calculate Z:

Difference in Proportions: Example • Results: • Critical Z = 1.96 • Observed Z = .739 • Conclusion: We can’t reject null hypothesis • Women and Men do not clearly differ in approval of Bush

Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions