270 likes | 363 Views
Psychology 10. Analysis of Psychological Data May 5, 2014. The plan for today. Illustrating the principle of least squares Another example of regression. Comparing the regression of Y on X with the regression of X on Y Inference in regression.
E N D
Psychology 10 Analysis of Psychological Data May 5, 2014
The plan for today • Illustrating the principle of least squares • Another example of regression. • Comparing the regression of Y on X with the regression of X on Y • Inference in regression. • Assumptions for inference in regression. • The residuals plot.
Another example of regression • Recall that we calculated the correlation between Peabody and Raven scores. • The correlation was about .60.
The regression of Peabody on Raven • When we calculated the correlation, SP was 3058.825, and SSRavenwas 5563.975. • Accordingly, the estimated slope is 3058.825 / 5563.975 =0.5497553. • The estimated intercept will be MPeabody - 0.5497553 × MRaven = 81.675 - 0.5497553 × 32.525 = 62.14494297. • So Peabody = 62.14 + 0.55 Raven.
Reversing the regression • If we consider regressing Raven on Peabody, the only additional information we need is SSPeabody= 4650.775. • Slope: 3058.825 / 4650.775 = 0.6577022109. • Intercept: 32.525 – 0.6577022109×81.675 = -21.19282808. • Raven = -21.19 + 0.66×Peabody.
Inference in regression • We can think about how much a regression is explaining by thinking about how much of the total variation is not residual variation around the line. • In fact, total variation in Y can be broken down into variation in predicted values and residual variation. • This forms the basis for inference about the slope.
Inference about the slope • Null hypothesis: H0: b = 0. • Test: calculate a sum of squares for the predicted values. • SSPredicted has 1 degree of freedom, so it is also the mean square. • Calculate a sum of squares for the residuals. • SSResidual has N-2 degrees of freedom; divide to get the mean square.
Inference about the slope (cont.) • Under the null hypothesis, the ratio of those mean squares has an F distribution. • (Demonstration, saving time by using software to do the calculations.)
The ANOVA table Source SS df MS F ----------------------------------------------------------- Regression 1681.605 Residual 2969.170 ------------------------------------ Total 4650.775
The ANOVA table Source SS df MS F ----------------------------------------------------------- Regression 1681.605 1 Residual 2969.170 38 ------------------------------------ Total 4650.775 39
The ANOVA table Source SS df MS F ----------------------------------------------------------- Regression 1681.605 1 1681.605 Residual 2969.170 38 78.136 ------------------------------------ Total 4650.775 39
The ANOVA table Source SS df MS F ----------------------------------------------------------- Regression 1681.605 1 1681.605 21.52 Residual 2969.170 38 78.136 ------------------------------------ Total 4650.775 39
Inference about the slope (cont.) • From the table, FCrit for 1 and 38 df is 4.10. • We got 21.52. • We reject the null hypothesis, and conclude that the population slope is not zero. • (Demonstration that the F statistic is identical for the regression of Peabody on Raven.)
Assumptions • Linear relationship. • Independent errors. • Equal variance in errors across the range of predicted values. • Normally distributed errors.
Checking assumptions • Linearity: examine scatterplot (and residual plot). • Independent errors: examine procedures that generated the data. • Equal variance of errors: examine the residual plot. • Normality of errors: examine the distribution of residuals.
Normality: Stem-and-leaf plot of residuals -1 | 443210 -0 | 988765 -0 | 44332110 0 | 01123344 0 | 55666889 1 | 12 1 | 5 2 | 2 | 7
Inference about correlations, revisited • When we discussed inference about correlations, we talked about the assumption of bivariate normality. • We noted that this assumption can be relaxed. • The test about the regression slope is the same as the test about the correlation. • So assumptions for testing correlation are the same as assumptions for testing slope.
Blood pressure example Systolic Diastolic 157 95 138 82 117 65 124 66 130 85 130 89 129 74 131 74 120 86 135 82
Blood pressure example • We had previously calculated that SP = 652.2, SS(Systolic) = 1112.9, and SS(Diastolic) = 867.6, leading to a correlation of 652.2 / √(1112.9×867.6) = .664.
Blood pressure example • The slope for the regression of Diastolic on Systolic is 652.2 / 1112.9 = 0. 0.5860365. • The intercept is 79.8 – 0.5860365×131.1 = 2.970615. • Predicted values are given by 2.970615 + 0.5860365 * Systolic.
Blood pressure example Systolic Diastolic Predicted 157 95 94.97834 138 82 83.84365 117 65 71.53689 124 66 75.63914 130 85 79.15536 130 89 79.15536 129 74 78.56932 131 74 79.74140 120 86 73.29500 135 82 82.08554
Coefficient of determination • The sum of squares for those predicted values is 382.213. • Recall that the sum of squares for Diastolic BP was 867.6. • The coefficient of determination is 382.213 / 867.6 = 0.4405406. • Note that the square root is .664, our previously calculated correlation.
Residuals • The residuals are simply the observed values of the dependent variable minus the predicted values.
Blood pressure example Systolic Diastolic Predicted Residual 157 95 94.97834 0.02165514 138 82 83.84365 -1.84365172 117 65 71.53689 -6.53688561 124 66 75.63914 -9.63914098 130 85 79.15536 5.84464013 130 89 79.15536 9.84464013 129 74 78.56932 -4.56932339 131 74 79.74140 -5.74139635 120 86 73.29500 12.70500494 135 82 82.08554 -0.08554228
Inference • The sum of squares of those residuals is 485.387. • We already know that the sum of squares for the predicted values is 382.213. • Note the additivity: 382.213 + 485.387 = 867.6, the Diastolic SS. • MS(predicted) = 382.213 / 1 = 382.213. • MS(residuals) = 485.387 / 8 = 60.67338. • F = 382.213/ 60.67338 = 6.300. .