290 likes | 437 Views
Correlations between 2 variables and Controlling for 3 rd variables. Binary variables: an example. Computing (lambda). Predicting Item Sex unknown: Prediction = “No” 13 correct, 6 wrong Errorratio = 6/19 Sex known: Prediction if Female: “No” if Male: “Yes”
E N D
Correlations between 2 variables and Controlling for 3rd variables
Computing (lambda) • Predicting Item • Sex unknown: • Prediction = “No” • 13 correct, 6 wrong • Errorratio = 6/19 • Sex known: • Prediction • if Female: “No” • if Male: “Yes” • 14 (=9+5) correct, 5 wrong • Error ratio = 5/19 • Proportional reduction of error ratio = When predicting “Item” one makes 17% less errors if “Sex” is known than if “Sex” is not known
Controlling for age • Predicting Item • Sex known and Young • errors=2 • Sex known and Old • errors=1 • Error ratio = 3/19 • Proportional reduction of error ratio: Proportional reduction of error when predicting “Item” if “Age” and "Sex" are known compared to if only “Sex” is known equals 40%
Marriages / Divorces / Births (per 1000 inhabitants) in the US Las Vegas!
Marriages / Divorces / Births (per 1000 inhabitants) in the US : dichotomized data
Lambda () (GoodMale & Kruskal, 1954) To what extent does the prediction of the rows improve when the columns are known in a 22 contingency table (and vice versa)? Marriage ratio predicted WITH knowledge on divorce ratio : High if Divorces HighLow if Divorces Low Marriage ratio predicted WITHOUT knowing divorce ratio : High (25>23)
Lambda is not (always) symmetrical predicting Marriages % = = .522 predicting Divorce % = = .542 • predicting Marriages % = .304 • predicting Births % = .304 predicting Births % = predicting Divorce % =
Controlling for 3rd variable: Partial lambda predicting Divorces = .542 Error ratio predicting Divorces when Marriages known = 11/48 To which extent does this proportion decrease if not only marriages but also births are known " predicting Divorces controlling for Marriages" The prediction of divorces from births AND marriages is NOT better than the prediction of divorces based on marriages only
Lambda and Partial lambda predicting divorces predicting Divorces controlling for Births The prediction of Divorces based on knowledge of both Births AND Marriages is 26.7% better than the prediction of Divorces based on knowledge of Births only
yi xi Generalization y x
Generalization: relating continuous variables : To what extent does prediction of rows improve when columns are known in a 22 contingency table (and vice versa)? Generalizaton r² : To what extent does prediction of variable y improve when one uses the knowledge of the regression line y=ax+b than if one does not use this information? Sum of squared errors predicting WITHOUT knowledge of X Sum of squared errors predicting WITH knowledge of X Improvement:
Partial r² Prediction of y from z only Find az en bz such that is minimal Prediction of y knowing both x and z Find azxb1 and b2 so that is minimal To what extent does the error made when predicting y decrease if z is taken into account on-top-of x
Application X = Births Y = Divorces Z = Marriages Wording the result: • When predicting the divorce rate in a state from knowing its marriage rate, the number of errors can be reduced with another .007 (less than 1%) if the state's birthrate is taken into account as well • In essence birth rate and divorce rate are unrelated when controlling for marriage rate • When holding marriage rates constant there is (almost) no relation between a state's birth ratio and its divorce ratio
Relation between a continuous variable and a dichotomous variable y = 9.6 + 1.4x Regression line Dummy variable Autocratic = 0 Democratic = 1
Computing r² When knowing the leadership style in a team we can predict its productivity 15% better than if we don't have that information
r² depends on the difference between the group means and the within group variance Higher r² Higher r²
r and the t-test Ggeneralizing a finding in a sample to the population H0 : there is no relation between a dichotomous variable x (groupmembership) and the continuous variable y H0 is unlikely if r² is high. WHAT IS "HIGH"? Statistics if in a population there is no relation between x and y, then r for sample data is distributed as
-3 -2 -1 0 1 2 3 Student t-distribution versus Normaldistribution N(0,1) t(100) .4 t(5) .3 .2 Type I wrong .1
Example - Marriages / Divorces / Births (per 1000 inhabitants) in the US Correlations (N=48-1) Las Vegas! Does a correlation of .347 suffice in order to conclude that a relation between berths and divorces exists in the population? Rule of thumb!Table is better > 2 Conclusion: most likely this relation exists in the population
Example – leadership style and productivity r² between leadership style and productivity equals .15(N=10 teams, r = .39) Conclusion: the observed relation is not strong enough to reject the null hypothesis which states that leadership style and productivity are unrelated However: this does NOT imply that the null hypothesis is correct!
Autocratic teams Democratic teams Classical approach for t-test
BE CAREFUL !! Statistically significantTheoretically relevant One can ALWAYS choose N large enough to make the t-test indicate a significant relation … example r=.04 r²=.0016 N=3000 Yields t = 2,19
Tolerance towards homosexuals • I won't associate with known homosexuals if I can help it. (+) • I would not be afraid for my child to have a homosexual teacher. (-) • It is normal for magazine stands to openly display homosexual magazines. (-) • Homosexuality, as far as I'm concerned, is not sinful. (-) • The increasing acceptance of homosexuality in our society is aiding in the deterioration of morals. (+) • I would vote for a homosexual in an election for public office. (-) • A TV comedy series centered on a homosexual couple would be a nice change from most series concentrated on heterosexual couples. (-) • Many homosexuals are good role models. (-) • If I were a parent, I could accept my son or daughter being gay. (-) • Homosexual websites, chat rooms, and programs should be banned from the internet. (+) 1=Strongly agree 2=Agree 3=Disagree 4=Strongly disagree
Descriptives Non-catholics Catholics
Scatterplot Scoreintolerance Y=15.24 + 6.80x Catholicism
Computation of T-test OR Conclusion: REJECT null-hypothesis which states that catholics are as tolerant towards homosexuals as non-catholics are