Correlations between 2 variables and Controlling for 3 rd variables

Correlations between 2 variables and Controlling for 3rd variables

Binary variables: an example

Computing  (lambda) • Predicting Item • Sex unknown: • Prediction = “No” • 13 correct, 6 wrong • Errorratio = 6/19 • Sex known: • Prediction • if Female: “No” • if Male: “Yes” • 14 (=9+5) correct, 5 wrong • Error ratio = 5/19 • Proportional reduction of error ratio =  When predicting “Item” one makes 17% less errors if “Sex” is known than if “Sex” is not known

Controlling for age • Predicting Item • Sex known and Young • errors=2 • Sex known and Old • errors=1 • Error ratio = 3/19 • Proportional reduction of error ratio: Proportional reduction of error when predicting “Item” if “Age” and "Sex" are known compared to if only “Sex” is known equals 40%

Marriages / Divorces / Births (per 1000 inhabitants) in the US Las Vegas!

Marriages / Divorces / Births (per 1000 inhabitants) in the US : dichotomized data

Lambda () (GoodMale & Kruskal, 1954) To what extent does the prediction of the rows improve when the columns are known in a 22 contingency table (and vice versa)? Marriage ratio predicted WITH knowledge on divorce ratio : High if Divorces HighLow if Divorces Low Marriage ratio predicted WITHOUT knowing divorce ratio : High (25>23)

Lambda is not (always) symmetrical  predicting Marriages % = = .522  predicting Divorce % = = .542 • predicting Marriages % = .304 •  predicting Births % = .304  predicting Births % =  predicting Divorce % =

Controlling for 3rd variable: Partial lambda  predicting Divorces = .542 Error ratio predicting Divorces when Marriages known = 11/48 To which extent does this proportion decrease if not only marriages but also births are known " predicting Divorces controlling for Marriages" The prediction of divorces from births AND marriages is NOT better than the prediction of divorces based on marriages only

Lambda and Partial lambda  predicting divorces  predicting Divorces controlling for Births The prediction of Divorces based on knowledge of both Births AND Marriages is 26.7% better than the prediction of Divorces based on knowledge of Births only

yi xi Generalization y x

Generalization: relating continuous variables  : To what extent does prediction of rows improve when columns are known in a 22 contingency table (and vice versa)? Generalizaton r² : To what extent does prediction of variable y improve when one uses the knowledge of the regression line y=ax+b than if one does not use this information? Sum of squared errors predicting WITHOUT knowledge of X Sum of squared errors predicting WITH knowledge of X Improvement:

Partial r² Prediction of y from z only Find az en bz such that is minimal Prediction of y knowing both x and z Find azxb1 and b2 so that is minimal To what extent does the error made when predicting y decrease if z is taken into account on-top-of x

Application X = Births Y = Divorces Z = Marriages Wording the result: • When predicting the divorce rate in a state from knowing its marriage rate, the number of errors can be reduced with another .007 (less than 1%) if the state's birthrate is taken into account as well • In essence birth rate and divorce rate are unrelated when controlling for marriage rate • When holding marriage rates constant there is (almost) no relation between a state's birth ratio and its divorce ratio

Relation between a continuous variable and a dichotomous variable y = 9.6 + 1.4x Regression line Dummy variable Autocratic = 0 Democratic = 1

Computing r² When knowing the leadership style in a team we can predict its productivity 15% better than if we don't have that information

r² depends on the difference between the group means and the within group variance Higher r² Higher r²

r and the t-test Ggeneralizing a finding in a sample to the population H0 : there is no relation between a dichotomous variable x (groupmembership) and the continuous variable y H0 is unlikely if r² is high. WHAT IS "HIGH"? Statistics  if in a population there is no relation between x and y, then r for sample data is distributed as

-3 -2 -1 0 1 2 3 Student t-distribution versus Normaldistribution N(0,1) t(100) .4 t(5) .3 .2 Type I wrong .1

Example - Marriages / Divorces / Births (per 1000 inhabitants) in the US Correlations (N=48-1) Las Vegas! Does a correlation of .347 suffice in order to conclude that a relation between berths and divorces exists in the population? Rule of thumb!Table is better > 2 Conclusion: most likely this relation exists in the population

Example – leadership style and productivity r² between leadership style and productivity equals .15(N=10 teams, r = .39) Conclusion: the observed relation is not strong enough to reject the null hypothesis which states that leadership style and productivity are unrelated However: this does NOT imply that the null hypothesis is correct!

Autocratic teams Democratic teams Classical approach for t-test

BE CAREFUL !! Statistically significantTheoretically relevant One can ALWAYS choose N large enough to make the t-test indicate a significant relation … example r=.04 r²=.0016 N=3000 Yields t = 2,19

Tolerance towards homosexuals • I won't associate with known homosexuals if I can help it. (+) • I would not be afraid for my child to have a homosexual teacher. (-) • It is normal for magazine stands to openly display homosexual magazines. (-) • Homosexuality, as far as I'm concerned, is not sinful. (-) • The increasing acceptance of homosexuality in our society is aiding in the deterioration of morals. (+) • I would vote for a homosexual in an election for public office. (-) • A TV comedy series centered on a homosexual couple would be a nice change from most series concentrated on heterosexual couples. (-) • Many homosexuals are good role models. (-) • If I were a parent, I could accept my son or daughter being gay. (-) • Homosexual websites, chat rooms, and programs should be banned from the internet. (+) 1=Strongly agree 2=Agree 3=Disagree 4=Strongly disagree

Descriptives Non-catholics Catholics

Histogram

Boxplot

Scatterplot Scoreintolerance Y=15.24 + 6.80x Catholicism

Computation of T-test OR Conclusion: REJECT null-hypothesis which states that catholics are as tolerant towards homosexuals as non-catholics are

Correlations between 2 variables and Controlling for 3 rd variables

Correlations between 2 variables and Controlling for 3 rd variables

Presentation Transcript

Testing for a Relationship Between 2 Categorical Variables

Relationship Between Variables

Examining Relationships Between 2 Variables

Relations Between Two Variables

Controlling Extraneous Variables

Relationships Between Categorical Variables

3 - Variables

Unit 4: Relationship between 2 Variables

Relationships Between Variables

Class 3 Relationship Between Variables

Testing Relationships between Variables

Ch 2 and 9.1 Relationships Between 2 Variables

Class 3 Relationship Between Variables

Relationships Between Measurements Variables

Relationships Between Measurement Variables

Exploring relationships between variables

Controlling Variables

Relationships between Variables

Controlling Extraneous Variables

Controlling Extraneous Variables

ASSOCIATION BETWEEN VARIABLES: CROSSTABULATIONS

Relationship between two continuous variables: correlations and linear regression