1 / 29

Correlations between 2 variables and Controlling for 3 rd variables

Correlations between 2 variables and Controlling for 3 rd variables. Binary variables: an example. Computing  (lambda). Predicting Item Sex unknown: Prediction = “No” 13 correct, 6 wrong Errorratio = 6/19 Sex known: Prediction if Female: “No” if Male: “Yes”

Download Presentation

Correlations between 2 variables and Controlling for 3 rd variables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Correlations between 2 variables and Controlling for 3rd variables

  2. Binary variables: an example

  3. Computing  (lambda) • Predicting Item • Sex unknown: • Prediction = “No” • 13 correct, 6 wrong • Errorratio = 6/19 • Sex known: • Prediction • if Female: “No” • if Male: “Yes” • 14 (=9+5) correct, 5 wrong • Error ratio = 5/19 • Proportional reduction of error ratio =  When predicting “Item” one makes 17% less errors if “Sex” is known than if “Sex” is not known

  4. Controlling for age • Predicting Item • Sex known and Young • errors=2 • Sex known and Old • errors=1 • Error ratio = 3/19 • Proportional reduction of error ratio: Proportional reduction of error when predicting “Item” if “Age” and "Sex" are known compared to if only “Sex” is known equals 40%

  5. Marriages / Divorces / Births (per 1000 inhabitants) in the US Las Vegas!

  6. Marriages / Divorces / Births (per 1000 inhabitants) in the US : dichotomized data

  7. Lambda () (GoodMale & Kruskal, 1954) To what extent does the prediction of the rows improve when the columns are known in a 22 contingency table (and vice versa)? Marriage ratio predicted WITH knowledge on divorce ratio : High if Divorces HighLow if Divorces Low Marriage ratio predicted WITHOUT knowing divorce ratio : High (25>23)

  8. Lambda is not (always) symmetrical  predicting Marriages % = = .522  predicting Divorce % = = .542 • predicting Marriages % = .304 •  predicting Births % = .304  predicting Births % =  predicting Divorce % =

  9. Controlling for 3rd variable: Partial lambda  predicting Divorces = .542 Error ratio predicting Divorces when Marriages known = 11/48 To which extent does this proportion decrease if not only marriages but also births are known " predicting Divorces controlling for Marriages" The prediction of divorces from births AND marriages is NOT better than the prediction of divorces based on marriages only

  10. Lambda and Partial lambda  predicting divorces  predicting Divorces controlling for Births The prediction of Divorces based on knowledge of both Births AND Marriages is 26.7% better than the prediction of Divorces based on knowledge of Births only

  11. yi xi Generalization y x

  12. Generalization: relating continuous variables  : To what extent does prediction of rows improve when columns are known in a 22 contingency table (and vice versa)? Generalizaton r² : To what extent does prediction of variable y improve when one uses the knowledge of the regression line y=ax+b than if one does not use this information? Sum of squared errors predicting WITHOUT knowledge of X Sum of squared errors predicting WITH knowledge of X Improvement:

  13. Partial r² Prediction of y from z only Find az en bz such that is minimal Prediction of y knowing both x and z Find azxb1 and b2 so that is minimal To what extent does the error made when predicting y decrease if z is taken into account on-top-of x

  14. Application X = Births Y = Divorces Z = Marriages Wording the result: • When predicting the divorce rate in a state from knowing its marriage rate, the number of errors can be reduced with another .007 (less than 1%) if the state's birthrate is taken into account as well • In essence birth rate and divorce rate are unrelated when controlling for marriage rate • When holding marriage rates constant there is (almost) no relation between a state's birth ratio and its divorce ratio

  15. Relation between a continuous variable and a dichotomous variable y = 9.6 + 1.4x Regression line Dummy variable Autocratic = 0 Democratic = 1

  16. Computing r² When knowing the leadership style in a team we can predict its productivity 15% better than if we don't have that information

  17. r² depends on the difference between the group means and the within group variance Higher r² Higher r²

  18. r and the t-test Ggeneralizing a finding in a sample to the population H0 : there is no relation between a dichotomous variable x (groupmembership) and the continuous variable y H0 is unlikely if r² is high. WHAT IS "HIGH"? Statistics  if in a population there is no relation between x and y, then r for sample data is distributed as

  19. -3 -2 -1 0 1 2 3 Student t-distribution versus Normaldistribution N(0,1) t(100) .4 t(5) .3 .2 Type I wrong .1

  20. Example - Marriages / Divorces / Births (per 1000 inhabitants) in the US Correlations (N=48-1) Las Vegas! Does a correlation of .347 suffice in order to conclude that a relation between berths and divorces exists in the population? Rule of thumb!Table is better > 2 Conclusion: most likely this relation exists in the population

  21. Example – leadership style and productivity r² between leadership style and productivity equals .15(N=10 teams, r = .39) Conclusion: the observed relation is not strong enough to reject the null hypothesis which states that leadership style and productivity are unrelated However: this does NOT imply that the null hypothesis is correct!

  22. Autocratic teams Democratic teams Classical approach for t-test

  23. BE CAREFUL !! Statistically significantTheoretically relevant One can ALWAYS choose N large enough to make the t-test indicate a significant relation … example r=.04 r²=.0016 N=3000 Yields t = 2,19

  24. Tolerance towards homosexuals • I won't associate with known homosexuals if I can help it. (+) • I would not be afraid for my child to have a homosexual teacher. (-) • It is normal for magazine stands to openly display homosexual magazines. (-) • Homosexuality, as far as I'm concerned, is not sinful. (-) • The increasing acceptance of homosexuality in our society is aiding in the deterioration of morals. (+) • I would vote for a homosexual in an election for public office. (-) • A TV comedy series centered on a homosexual couple would be a nice change from most series concentrated on heterosexual couples. (-) • Many homosexuals are good role models. (-) • If I were a parent, I could accept my son or daughter being gay. (-) • Homosexual websites, chat rooms, and programs should be banned from the internet. (+) 1=Strongly agree 2=Agree 3=Disagree 4=Strongly disagree

  25. Descriptives Non-catholics Catholics

  26. Histogram

  27. Boxplot

  28. Scatterplot Scoreintolerance Y=15.24 + 6.80x Catholicism

  29. Computation of T-test OR Conclusion: REJECT null-hypothesis which states that catholics are as tolerant towards homosexuals as non-catholics are

More Related