500 likes | 664 Views
Action Research More Crosstab Measures. INFO 515 Glenn Booker. Nominal Crosstab Tests. Four more measures which could apply to nominal data in a crosstab Eta Lambda Goodman and Kruskal’s tau Uncertainty coefficient. Eta Coefficient.
E N D
Action ResearchMore Crosstab Measures INFO 515 Glenn Booker Lecture #9
Nominal Crosstab Tests • Four more measures which could apply to nominal data in a crosstab • Eta • Lambda • Goodman and Kruskal’s tau • Uncertainty coefficient Lecture #9
Eta Coefficient • Used when the dependent variable uses an interval or ratio scale, and the independent variable is nominal or ordinal • Eta () squared is the proportion of the dependent variable’s variance which is explained by the independent variable • Eta squared is symmetric, and ranges from 0 to 1 • This is the same eta from the end of lecture 6 Lecture #9
Directional vs Symmetric • Directional measures give a different answer depending on whether A is dependent on B, or B is dependent on A • Symmetric measures don’t care which variable is dependent or independent • Tests indicate whether there is a statistically significant relationship; measures, here, describe the strength of association Lecture #9
Directional Measures • Directional measures help determine how much the dependent variable is affected by the independent variable • Directional measures for nominal data: • Lambda (recommended) • Goodman and Kruskal’s tau • Uncertainty coefficient Lecture #9
Directional Measures • Directional measures generally range from 0 to 1 • A value of 0 means the independent variable doesn’t help predict the dependent variable • A value of 1 means the independent variable perfectly predicts the resulting dependent variable Lecture #9
Directional Measures • In this context, either variable can be considered dependent or independent • Does A predict B? • Does B predict A? • A “symmetric” value is the weighted average of the two possible selections (A predicts B, or B predicts A) Lecture #9
Proportional Reduction in Error • Proportional Reduction in Error (PRE) measures find the fractional reduction in errors due to some factor (such as an independent variable) PRE = (Error without X – Error with X) / Error with X • Two we’ll look at are Lambda, and Goodman and Kruskal’s Tau Lecture #9
Lambda Coefficient • Lambda has a symmetric option for output • Its Value is the proportion of the dependent variable predicted by the independent one • The Asymptotic Std. Error allows a 95% confidence interval to be made • “Approx. T” is the Value divided by the Std. Error if the parameter were zero (not the usual definition!) Lecture #9
Goodman and Kruskal’s Tau • SPSS note: Goodman and Kruskal’s Tau is not directly selected; it appears only when Lambda is checked! • Does not have Symmetric option • Does not approximate T • Based on chi square • Otherwise similar to Lambda for interpretation Lecture #9
Uncertainty Coefficient • Does have symmetric dependency option • Does have T approximation • Also based on chi square • Goodman and Kruskal’s tau and the Uncertainty Coefficient may give opposite results as Lambda, so use them cautiously! Lecture #9
Nominal Example • Use “GSS91 political.sav” data set • Use Analyze / Descriptive Statistics / Crosstabs… • Select “region” for Row(s), and “relig” for Column(s) • Under “Statistics…” select Lambda, and Uncertainty Coefficient Lecture #9
Nominal Example Lecture #9
Nominal Example - Lambda • Focus on the Lambda (l) output first • Lambda measures the percent of error reduction when using the independent variable to predict the dependent variable • Calculation based on any desired outcome contributing to lambda • Lambda ranges from 0 to 1 Lecture #9
Nominal Example • As usual, we want Sig. < 0.050 for the meaning of lambda to be statistically significant • If Region is dependent, then we see that religious preference is a significant (sig. = 0.000) predictor • “relig” contributes (Value) 4.8% +/- (Std Error) 1.2% of the variability of a person’s region Lecture #9
Lambda Example • 95% confidence interval of that contribution is (not shown) 4.8 – 2*1.2 = 2.4% to 4.8 + 2*1.2 = 7.2% • But “region” is not a significant predictor of “relig” (sig. = 0.099) • Ignore the value of lambda if it isn’t significant • The symmetric value is significant, and its Value is between the other two lambda values Lecture #9
G and K Tau Example • Goodman and Kruskal’s tau (t) is similar to lambda, but is based on predictions in the same proportion as the marginal totals (individual row or column subtotals) • No symmetric value is given – it’s only directional • Same method for interpretation, but notice it predicts both variables can be significant as dependent, and ‘relig’ is much stronger! Still from slide 13 Lecture #9
Uncertainty Coefficient Example • Is a measure of association that indicates the proportional reduction in error when values of one variable are used to predict values of the other variable • The program calculates both symmetric and directional versions of it • Here, gives results similar to G and K Tau Lecture #9
Tests for 2x2 Tables • Many special measures can be applied to a 2x2 table, including: • Relative risk • Odds ratio • Look at these in the context of answering questions like: “Are people who approve of women working more likely to vote for a woman President?” Lecture #9
Tests for 2x2 Tables • Use “GSS91 social.sav” data set • Variables are “should women work” (fework) and “vote for woman president” (fepres) • Isolate the cases using Data / Select Cases • Use the If condition(fepres=1 | fepres=2) & (fework=1 | fework=2) ‘|’ means ‘or’; ‘&’ means ‘and’ Lecture #9
Tests for 2x2 Tables • Use Analyze / Descriptive Statistics / Crosstabs… • Select “fework” for Row(s), and “fepres” for Column(s) • For Statistics select Risk • For Cells select Row percentages • This gives 947 valid cases Lecture #9
Tests for 2x2 Tables Lecture #9
Tests for 2x2 Tables ‘cohort’ = subset Lecture #9
Relative Risk • The relative risk is a ratio of percentages • It is very directional • Those who (approve of voting for a woman president) are 1.178 times as likely to (approve of women working) • Based on 93.4%/79.3% = 1.178 • Note the 95% confidence intervals for each ratio are given; roughly 1.09 to 1.27 for this example Lecture #9
Relative Risk • Conversely, those who do not approve of voting for a woman president are 0.317 times as likely to approve of women working (6.6/20.7=0.317), with a broader confidence interval of 0.22 to 0.47 Lecture #9
Odds Ratio • The odds ratio is the ratio of (the probability that the event occurs) to (the probability that the event does not occur) • The odds ratio that someone who (would vote for a woman president) also (approves of women working) has two terms • One is the ratio of (those who approve of women working) divided by (voting for a woman president) (93.4/6.6=14.152)... Lecture #9
Odds Ratio • Divided by the ratio of (those who would NOT approve of women working) (voting for a woman president) (79.3/20.7=3.831) • Hence the odds ratio is 14.152/3.831 =3.694 or (93.4*20.7)/(6.6*79.3) • Round off error, probably in the 6.6 value, kept us from getting the stated odds ratio of 3.712 (first row of output on slide 23) Lecture #9
Square Tables (RxR) • Tables with the same number of rows as columns (RxR tables) also have special measures • Cohen’s Kappa (k), which measures the strength of agreement (did two people’s measurements match well?) • Applies for R values of one nominal variable Lecture #9
Kappa • Kappa is used only when the rows and columns have the same categories • Set of possible diagnoses achieved by two different doctors • Two sets of outcomes which are believed to be dependent on each other • Kappa ranges from zero to one; is one when the diagonal has the only non-zero values Lecture #9
Kappa Example • Example here is the educational level of one’s parents (maeduc and paeduc; as in ‘ma and pa education’) • Use “GSS91 social.sav” data set • Define new variables madeg and padeg, which are derived from maeduc and paeduc (convert years of education into rough levels of achievement) Lecture #9
Kappa Example • New scale for madeg and padeg is • Education <12 is code 1, “LT High School” • Education 12-15 is code 2, “High School” • Education 16 is code 3, “Bachelor degree” • Education 17+ is code 4, “Graduate” • Use Analyze / Descriptive Statistics / Crosstabs… Lecture #9
Kappa Example • Select “padeg” for Row(s), and “madeg” for Column(s) • For Statistics select Kappa • The basic crosstab just shows the data counts (next slide) • Then we get the Kappa measure (slide after next) • As usual, check to make sure the result is significant before going any further Lecture #9
Kappa Example Lecture #9
Kappa Example Lecture #9
Kappa Example • Here the significance is 0.000, very clearly significant (< 0.050) • This is confirmed by the approximate T of over 20 - as before, this T is based on the null hypothesis • The actual value of kappa and its standard error are 0.325 +/- 0.018 • What does this mean? Lecture #9
Kappa • Kappa is judged on a fairly fixed scale • Kappa below 0.40 indicates poor agreement beyond chance • Kappa from 0.40 to 0.75 is fair to good agreement • Kappa above 0.75 is strong agreement • So in this case we are confident there is poor agreement between parents’ education Scale from J.L. Fleiss, 1981 Lecture #9
Ordinal Crosstab Measures • Several association measures can be used for a table with R rows and C columns which contain ordinal data (and presumably R ≠ C) • Kendall’s tau-b • Kendall’s tau-c • (Goodman and Kruskal’s) Gamma (preferred) • Somers’ d • Spearman’s Correlation Coefficient Lecture #9
General RxC Table Measures • Many are based on comparing adjacent pairs of data from the two variables • If B increases when A increases, the pair is concordant • If B decreases when A increases, the pair is discordant • If A and B are equal, the pair is tied Lecture #9
General RxC Table Measures • The number of concordant pairs is “P” • The number of discordant pairs is “Q” • The number of ties on X are “Tx” • The number of ties on Y are “Ty” • The smaller of the number of rows R and columns C is called “m” m = min(R,C) • Given this vocabulary, we can define many measures Lecture #9
General RxC Table Measures • Kendall’s tau-b istau-b = (P-Q) / sqrt[ (P+Q+Tx)*(P+Q+Ty) ] • Kendall’s tau-c istau-c = 2m*(P-Q) / [N2*(m-1)] • Gamma (g) isGamma = (P-Q) / (P+Q) • Somers’ d isdy = (P-Q) / (P+Q+Ty) or dx = (P-Q) / (P+Q+Tx) Lecture #9
General RxC Table Measures • All of the RxC measures are symmetric except Somers’ d, which has both symmetric and directional values given • All are evaluated by their significance, which also has an approximate T score • All are expressed by a Value +/- its Std Error Lecture #9
RxC Measures Example • Use “GSS91 social.sav” data set • Use Analyze / Descriptive Statistics / Crosstabs… • Select “paeduc” for Row(s), and “maeduc” for Column(s) • Under “Statistics…” select Eta, Correlations, Gamma, Somers’ d, Kendall’s tau-b and tau-c Lecture #9
RxC Measures Example • This compares the number of years of education of one’s mother and father to see how strongly they affect one another • The crosstab data table is very large, since it ranges from 0 to 20 for each category, with irregular gaps (we’re not using the simplified categories from the Kappa example) • Hence we’re not showing it here! Lecture #9
RxC Measures Example Both measures show the mother’s education is a slightly better predictor Lecture #9
RxC Measures Example • Directional measures: • Somers’ d is significant • It shows that there are about 55% +/- 2% more concordant pairs than discordant ones, excluding ties on the independent variable • The Eta measure shows that around 69% of the variability of one parent’s education is shared with the other’s Lecture #9
RxC Measures Example Lecture #9
RxC Measures Example • All of the symmetric measures are statistically significant, with approximate t values around 27-28 • The Kendall tau-b and tau-c measures disagree a little on the magnitude of the agreement • Gamma and Spearman give fairly strong positive correlations Lecture #9
RxC Measures Example • Spearman, like ‘r’, ranges from -1 to +1, and does not require a normal distribution • Based on ordered categories, not their values • Even ‘r’ can be calculated for this case, and it gives results similar to Gamma and Spearman Lecture #9
Yule’s Q • A special case of gamma for a 2x2 table is called Yule’s Q • It is appropriate for ordinal data in 2x2 tables; so values for each variable are Low/High, Yes/No, or similar • Define Yule’s Q = (a*d – b*c) / (a*d + b*c) • See PDF page 59 of Action Research handout for the definition of a, b, c, and d (cell labels) Lecture #9
Yule’s Q • Measures the strength and direction of association from -1 (perfect negative association) to 0 (no association) to +1 (perfect positive association) • Judge the results for Yule’s Q by the table on page 59 of Action Research handout ; and see pages 58-64 for other related discussion Lecture #9