Action Research More Crosstab Measures

Action ResearchMore Crosstab Measures INFO 515 Glenn Booker Lecture #9

Nominal Crosstab Tests • Four more measures which could apply to nominal data in a crosstab • Eta • Lambda • Goodman and Kruskal’s tau • Uncertainty coefficient Lecture #9

Eta Coefficient • Used when the dependent variable uses an interval or ratio scale, and the independent variable is nominal or ordinal • Eta () squared is the proportion of the dependent variable’s variance which is explained by the independent variable • Eta squared is symmetric, and ranges from 0 to 1 • This is the same eta from the end of lecture 6 Lecture #9

Directional vs Symmetric • Directional measures give a different answer depending on whether A is dependent on B, or B is dependent on A • Symmetric measures don’t care which variable is dependent or independent • Tests indicate whether there is a statistically significant relationship; measures, here, describe the strength of association Lecture #9

Directional Measures • Directional measures help determine how much the dependent variable is affected by the independent variable • Directional measures for nominal data: • Lambda (recommended) • Goodman and Kruskal’s tau • Uncertainty coefficient Lecture #9

Directional Measures • Directional measures generally range from 0 to 1 • A value of 0 means the independent variable doesn’t help predict the dependent variable • A value of 1 means the independent variable perfectly predicts the resulting dependent variable Lecture #9

Directional Measures • In this context, either variable can be considered dependent or independent • Does A predict B? • Does B predict A? • A “symmetric” value is the weighted average of the two possible selections (A predicts B, or B predicts A) Lecture #9

Proportional Reduction in Error • Proportional Reduction in Error (PRE) measures find the fractional reduction in errors due to some factor (such as an independent variable) PRE = (Error without X – Error with X) / Error with X • Two we’ll look at are Lambda, and Goodman and Kruskal’s Tau Lecture #9

Lambda Coefficient • Lambda has a symmetric option for output • Its Value is the proportion of the dependent variable predicted by the independent one • The Asymptotic Std. Error allows a 95% confidence interval to be made • “Approx. T” is the Value divided by the Std. Error if the parameter were zero (not the usual definition!) Lecture #9

Goodman and Kruskal’s Tau • SPSS note: Goodman and Kruskal’s Tau is not directly selected; it appears only when Lambda is checked! • Does not have Symmetric option • Does not approximate T • Based on chi square • Otherwise similar to Lambda for interpretation Lecture #9

Uncertainty Coefficient • Does have symmetric dependency option • Does have T approximation • Also based on chi square • Goodman and Kruskal’s tau and the Uncertainty Coefficient may give opposite results as Lambda, so use them cautiously! Lecture #9

Nominal Example • Use “GSS91 political.sav” data set • Use Analyze / Descriptive Statistics / Crosstabs… • Select “region” for Row(s), and “relig” for Column(s) • Under “Statistics…” select Lambda, and Uncertainty Coefficient Lecture #9

Nominal Example Lecture #9

Nominal Example - Lambda • Focus on the Lambda (l) output first • Lambda measures the percent of error reduction when using the independent variable to predict the dependent variable • Calculation based on any desired outcome contributing to lambda • Lambda ranges from 0 to 1 Lecture #9

Nominal Example • As usual, we want Sig. < 0.050 for the meaning of lambda to be statistically significant • If Region is dependent, then we see that religious preference is a significant (sig. = 0.000) predictor • “relig” contributes (Value) 4.8% +/- (Std Error) 1.2% of the variability of a person’s region Lecture #9

Lambda Example • 95% confidence interval of that contribution is (not shown) 4.8 – 2*1.2 = 2.4% to 4.8 + 2*1.2 = 7.2% • But “region” is not a significant predictor of “relig” (sig. = 0.099) • Ignore the value of lambda if it isn’t significant • The symmetric value is significant, and its Value is between the other two lambda values Lecture #9

G and K Tau Example • Goodman and Kruskal’s tau (t) is similar to lambda, but is based on predictions in the same proportion as the marginal totals (individual row or column subtotals) • No symmetric value is given – it’s only directional • Same method for interpretation, but notice it predicts both variables can be significant as dependent, and ‘relig’ is much stronger! Still from slide 13 Lecture #9

Uncertainty Coefficient Example • Is a measure of association that indicates the proportional reduction in error when values of one variable are used to predict values of the other variable • The program calculates both symmetric and directional versions of it • Here, gives results similar to G and K Tau Lecture #9

Tests for 2x2 Tables • Many special measures can be applied to a 2x2 table, including: • Relative risk • Odds ratio • Look at these in the context of answering questions like: “Are people who approve of women working more likely to vote for a woman President?” Lecture #9

Tests for 2x2 Tables • Use “GSS91 social.sav” data set • Variables are “should women work” (fework) and “vote for woman president” (fepres) • Isolate the cases using Data / Select Cases • Use the If condition(fepres=1 | fepres=2) & (fework=1 | fework=2) ‘|’ means ‘or’; ‘&’ means ‘and’ Lecture #9

Tests for 2x2 Tables • Use Analyze / Descriptive Statistics / Crosstabs… • Select “fework” for Row(s), and “fepres” for Column(s) • For Statistics select Risk • For Cells select Row percentages • This gives 947 valid cases Lecture #9

Tests for 2x2 Tables Lecture #9

Tests for 2x2 Tables ‘cohort’ = subset Lecture #9

Relative Risk • The relative risk is a ratio of percentages • It is very directional • Those who (approve of voting for a woman president) are 1.178 times as likely to (approve of women working) • Based on 93.4%/79.3% = 1.178 • Note the 95% confidence intervals for each ratio are given; roughly 1.09 to 1.27 for this example Lecture #9

Relative Risk • Conversely, those who do not approve of voting for a woman president are 0.317 times as likely to approve of women working (6.6/20.7=0.317), with a broader confidence interval of 0.22 to 0.47 Lecture #9

Odds Ratio • The odds ratio is the ratio of (the probability that the event occurs) to (the probability that the event does not occur) • The odds ratio that someone who (would vote for a woman president) also (approves of women working) has two terms • One is the ratio of (those who approve of women working) divided by (voting for a woman president) (93.4/6.6=14.152)... Lecture #9

Odds Ratio • Divided by the ratio of (those who would NOT approve of women working) (voting for a woman president) (79.3/20.7=3.831) • Hence the odds ratio is 14.152/3.831 =3.694 or (93.4*20.7)/(6.6*79.3) • Round off error, probably in the 6.6 value, kept us from getting the stated odds ratio of 3.712 (first row of output on slide 23) Lecture #9

Square Tables (RxR) • Tables with the same number of rows as columns (RxR tables) also have special measures • Cohen’s Kappa (k), which measures the strength of agreement (did two people’s measurements match well?) • Applies for R values of one nominal variable Lecture #9

Kappa • Kappa is used only when the rows and columns have the same categories • Set of possible diagnoses achieved by two different doctors • Two sets of outcomes which are believed to be dependent on each other • Kappa ranges from zero to one; is one when the diagonal has the only non-zero values Lecture #9

Kappa Example • Example here is the educational level of one’s parents (maeduc and paeduc; as in ‘ma and pa education’) • Use “GSS91 social.sav” data set • Define new variables madeg and padeg, which are derived from maeduc and paeduc (convert years of education into rough levels of achievement) Lecture #9

Kappa Example • New scale for madeg and padeg is • Education <12 is code 1, “LT High School” • Education 12-15 is code 2, “High School” • Education 16 is code 3, “Bachelor degree” • Education 17+ is code 4, “Graduate” • Use Analyze / Descriptive Statistics / Crosstabs… Lecture #9

Kappa Example • Select “padeg” for Row(s), and “madeg” for Column(s) • For Statistics select Kappa • The basic crosstab just shows the data counts (next slide) • Then we get the Kappa measure (slide after next) • As usual, check to make sure the result is significant before going any further Lecture #9

Kappa Example Lecture #9

Kappa Example • Here the significance is 0.000, very clearly significant (< 0.050) • This is confirmed by the approximate T of over 20 - as before, this T is based on the null hypothesis • The actual value of kappa and its standard error are 0.325 +/- 0.018 • What does this mean? Lecture #9

Kappa • Kappa is judged on a fairly fixed scale • Kappa below 0.40 indicates poor agreement beyond chance • Kappa from 0.40 to 0.75 is fair to good agreement • Kappa above 0.75 is strong agreement • So in this case we are confident there is poor agreement between parents’ education Scale from J.L. Fleiss, 1981 Lecture #9

Ordinal Crosstab Measures • Several association measures can be used for a table with R rows and C columns which contain ordinal data (and presumably R ≠ C) • Kendall’s tau-b • Kendall’s tau-c • (Goodman and Kruskal’s) Gamma (preferred) • Somers’ d • Spearman’s Correlation Coefficient Lecture #9

General RxC Table Measures • Many are based on comparing adjacent pairs of data from the two variables • If B increases when A increases, the pair is concordant • If B decreases when A increases, the pair is discordant • If A and B are equal, the pair is tied Lecture #9

General RxC Table Measures • The number of concordant pairs is “P” • The number of discordant pairs is “Q” • The number of ties on X are “Tx” • The number of ties on Y are “Ty” • The smaller of the number of rows R and columns C is called “m” m = min(R,C) • Given this vocabulary, we can define many measures Lecture #9

General RxC Table Measures • Kendall’s tau-b istau-b = (P-Q) / sqrt[ (P+Q+Tx)*(P+Q+Ty) ] • Kendall’s tau-c istau-c = 2m*(P-Q) / [N2*(m-1)] • Gamma (g) isGamma = (P-Q) / (P+Q) • Somers’ d isdy = (P-Q) / (P+Q+Ty) or dx = (P-Q) / (P+Q+Tx) Lecture #9

General RxC Table Measures • All of the RxC measures are symmetric except Somers’ d, which has both symmetric and directional values given • All are evaluated by their significance, which also has an approximate T score • All are expressed by a Value +/- its Std Error Lecture #9

RxC Measures Example • Use “GSS91 social.sav” data set • Use Analyze / Descriptive Statistics / Crosstabs… • Select “paeduc” for Row(s), and “maeduc” for Column(s) • Under “Statistics…” select Eta, Correlations, Gamma, Somers’ d, Kendall’s tau-b and tau-c Lecture #9

RxC Measures Example • This compares the number of years of education of one’s mother and father to see how strongly they affect one another • The crosstab data table is very large, since it ranges from 0 to 20 for each category, with irregular gaps (we’re not using the simplified categories from the Kappa example) • Hence we’re not showing it here! Lecture #9

RxC Measures Example Both measures show the mother’s education is a slightly better predictor Lecture #9

RxC Measures Example • Directional measures: • Somers’ d is significant • It shows that there are about 55% +/- 2% more concordant pairs than discordant ones, excluding ties on the independent variable • The Eta measure shows that around 69% of the variability of one parent’s education is shared with the other’s Lecture #9

RxC Measures Example Lecture #9

RxC Measures Example • All of the symmetric measures are statistically significant, with approximate t values around 27-28 • The Kendall tau-b and tau-c measures disagree a little on the magnitude of the agreement • Gamma and Spearman give fairly strong positive correlations Lecture #9

RxC Measures Example • Spearman, like ‘r’, ranges from -1 to +1, and does not require a normal distribution • Based on ordered categories, not their values • Even ‘r’ can be calculated for this case, and it gives results similar to Gamma and Spearman Lecture #9

Yule’s Q • A special case of gamma for a 2x2 table is called Yule’s Q • It is appropriate for ordinal data in 2x2 tables; so values for each variable are Low/High, Yes/No, or similar • Define Yule’s Q = (a*d – b*c) / (a*d + b*c) • See PDF page 59 of Action Research handout for the definition of a, b, c, and d (cell labels) Lecture #9

Yule’s Q • Measures the strength and direction of association from -1 (perfect negative association) to 0 (no association) to +1 (perfect positive association) • Judge the results for Yule’s Q by the table on page 59 of Action Research handout ; and see pages 58-64 for other related discussion Lecture #9

Action Research More Crosstab Measures

Action Research More Crosstab Measures

Presentation Transcript

ACTION RESEARCH

ACTION RESEARCH

Action research

Action Research

Action Research

ACTION RESEARCH

Action Research

Action Research

More repeated measures

Action Research

Action Research

ACTION RESEARCH

More on complexity measures

Action Research

Action Research

Action Research

Action Research

Action Research

ACTION RESEARCH

Crosstab 2 – Measures of Association