Bivariate Relationships

Bivariate Relationships Plotting a Line

Review: Covariance • When it tends to be the case that x is greater than the mean when y is greater than the mean AND x is lower than the mean when y is lower than the mean, then there is a positive covariation

Plot showing positive covariance Mean urban % Mean female literacy

Expected value • But we may want to know more specific knowledge than that – we may want to know the expected value of y for each increased value of x • I may know the mean of everyone’s height in class • But if I know gender, then I can generate two expected values • If you remember, we are always trying to do better than the mean

Regression analysis:important to know substantive effect • For every 10K dollars given in humanitarian aid, there is an increase in 3K spent on weapons • Different from every 10K dollars given in humanitarian aid, there is a .5K increase spent on weapons • Different from every 10K dollars given in humanitarian aid, there is a 8K increase spent on weapons • Unit of analysis?

Regression equation • y = a + bx + e • ŷ = a + bx • ŷ is also known as yhat • y is the dependent variable value • yhat is the predicted value • a is the intercept

X and Y • Y X • 2 1 • 2 • 4 3 • 3 4 • 6 5 • 5 6

X and Y

Predicted values 6 6.00 5.57 5 5.00 4.74 3 3.91 4.00 y 4 3.00 3.09 2.26 1 2.00 1.43 2 1.00 1.00 2.00 3.00 4.00 5.00 6.00 x

Residual values 6 6.00 1.26 5 5.00 .91 -.57 3 4.00 y 4 3.00 -.91 .57 1 2.00 -1.26 2 1.00 1.00 2.00 3.00 4.00 5.00 6.00 x

Descriptives y x pred res exp unexp tot 2 1 1.43 0.57 4.29 0.73 2.25 1 2 2.26 -1.26 1.54 12.35 6.25 4 3 3.09 0.91 0.17 4.72 0.25 3 4 3.91 -0.91 0.17 23.32 0.25 5 6 5.57 -0.57 4.29 37.73 2.25 6 5 4.74 1.26 1.54 12.15 6.25 generate totvar = ((y-3.5)^2)/n-1 generate exp = ((pred - 3.5)^2)/n-1 generate unexp = ((res - pred)^2)/n-1

Descriptive statistics

Bivariate regression . regr y x, beta Source | SS df MS Number of obs = 6 -------------+------------------------------ F( 1, 4) = 8.76 Model | 12.0142857 1 12.0142857 Prob > F = 0.0416 Residual | 5.48571429 4 1.37142857 R-squared = 0.6865 -------------+------------------------------ Adj R-squared = 0.6082 Total | 17.5 5 3.5 Root MSE = 1.1711 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| Beta -------------+---------------------------------------------------------------- x | .8285714 .2799417 2.96 0.042 .8285714 _cons | .6 1.090216 0.55 0.611 . ------------------------------------------------------------------------------

Why unstandardized slope is the same as standardized slope? • In other words, what would have to be the case if this is true?

Standard deviations are the same • Descriptive Statistics • Mean Std. Deviation N • y 3.5000 1.87083 6 • x 3.5000 1.87083 6

Correlations and other statistics

Another example

Descriptives Syntax . sum happy Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- happy | 11 1.909091 .700649 1 3 . . sum occpres Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- occpres | 11 1.909091 .8312094 1 3

Life Happiness and Occupational Prestige

General | occpres • Happiness | 1 2 3 | Total • --------------+---------------------------------+---------- • Not Too Happy | 0 1 2 | 3 • | 0.00 25.00 66.67 | 27.27 • --------------+---------------------------------+---------- • Pretty Happy | 3 3 0 | 6 • | 75.00 75.00 0.00 | 54.55 • --------------+---------------------------------+---------- • Very Happy | 1 0 1 | 2 • | 25.00 0.00 33.33 | 18.18 • --------------+---------------------------------+---------- • Total | 4 4 3 | 11 • | 100.00 100.00 100.00 | 100.00 • Kendall's tau-b = -0.3689 ASE = 0.318

Correlation between happiness and occupational prestige • . corr happy prestg80 • (obs=11) • | happy prestg80 • -------------+------------------ • happy | 1.0000 • prestg80 | -0.5181 1.0000

Correlation between happiness and categorical occupational prestige • . corr happy occpres • (obs=11) • | happy occpres • -------------+------------------ • happy | 1.0000 • occpres | -0.3590 1.0000

Life Happiness and Prestige

Regression Syntax • Syntax is regr DV IV • regr happy prest80, beta • Reports beta coefficients – same as Pearson r (when there is only one independent variable) • regr happy prest80 • Reports confidence intervals instead of betas

Regression results Source | SS df MS Number of obs = 11 -------------+------------------------------ F( 1, 9) = 3.30 Model | 1.31753739 1 1.31753739 Prob > F = 0.1026 Residual | 3.59155351 9 .399061502 R-squared = 0.2684 -------------+------------------------------ Adj R-squared = 0.1871 Total | 4.90909091 10 .490909091 Root MSE = .63171 ------------------------------------------------------------------------------ happy | Coef. Std. Err. t P>|t| Beta -------------+---------------------------------------------------------------- prestg80 | -.0380391 .0209348 -1.82 0.103 -.518061 _cons | 3.330371 .8050567 4.14 0.003 . ------------------------------------------------------------------------------ Why are we not that confident in our results? Why is the beta so much larger than the coefficient for the slope?

Regression results Source | SS df MS Number of obs = 11 -------------+------------------------------ F( 1, 9) = 3.30 Model | 244.378788 1 244.378788 Prob > F = 0.1026 Residual | 666.166667 9 74.0185185 R-squared = 0.2684 -------------+------------------------------ Adj R-squared = 0.1871 Total | 910.545455 10 91.0545455 Root MSE = 8.6034 ------------------------------------------------------------------------------ prestg80 | Coef. Std. Err. t P>|t| Beta -------------+---------------------------------------------------------------- happy | -7.055556 3.88302 -1.82 0.103 -.518061 _cons | 50.83333 7.853795 6.47 0.000 . ------------------------------------------------------------------------------ Why is the coefficient so much bigger? What happens to the confidence? Why is the beta the same?

3.5 3 2.5 2 Are you happy? (mean = 1.9) 1.5 1 0.5 0 0 10 20 30 40 50 60 Occupational Prestige: (mean = 37) Life happiness = 3.33 - .038 Occupational Prestige

What happens to the confidence if we keep the slope the same but double the n?

What happens to the confidence if we keep the doubled n but decrease the variance of occupational prestige?

Syntax • generate occpres = 1 if prestg80 < 33 • replace occpres = 2 if (prestg80 < 45 and prestg80 > 32) • replace occpres = 3 if prestg80 > 45 • label define highmedlow 1 low 2 med 3 high • label values occpres highmedlow

. sum prestg80 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- prestg80 | 11 37.36364 9.542251 22 51

Bivariate Relationships