260 likes | 368 Views
Chapter 7. Correlation, Bivariate Regression, and Multiple Regression. Pearson’s Product Moment Correlation. Correlation measures the association between two variables. Correlation quantifies the extent to which the mean, variation & direction of one variable are related to another variable.
E N D
Chapter 7 Correlation, Bivariate Regression, and Multiple Regression
Pearson’s Product Moment Correlation • Correlation measures the association between two variables. • Correlation quantifies the extent to which the mean, variation & direction of one variable are related to another variable. • r ranges from +1 to -1. • Correlation can be used for prediction. • Correlation does not indicate the cause of a relationship.
Scatter Plot • Scatter plot gives a visual description of the relationship between two variables. • The line of best fit is defined as the line that minimized the squared deviations from a data point up to or down to the line.
Line of Best Fit Minimizes Squared Deviations from a Data Point to the Line
Always do a Scatter Plot to Check the Shape of the Relationship
Will a Linear Fit Work? y = 0.5246x - 2.2473 R2 = 0.4259
Linear Fit y = 0.0012x - 1.0767 R2 = 0.0035
Evaluating the Strength of a Correlation • For predictions, absolute value of r < .7, may produce unacceptably large errors, especially if the SDs of either or both X & Y are large. • As a general rule • Absolute value r greater than or equal .9 is good • Absolute value r equal to .7 - .8 is moderate • Absolute value r equal to .5 - .7 is low • Values for r below .5 give R2 = .25, or 25% are poor, and thus not useful for predicting.
Significant Correlation?? If N is large (N=90) then a .205 correlation is significant. ALWAYS THINK ABOUT R2 How much variance in Y is X accounting for? r = .205 R2 = .042, thus X is accounting for 4.2% of the variance in Y. This will lead to poor predictions. A 95% confidence interval will also show how poor the prediction is.
Venn diagram shows (R2) the amount of variance in Y that is explained by X. R2=.64 (64%) Variance in Y that is explained by X Unexplained Variance in Y. (1-R2) = .36, 36%
The vertical distance (up or down) from a data point to the line of best fit is a RESIDUAL. r = .845 R2 = .714 (71.4%) Y = mX + b Y = .72 X + 13
Standard Error of Estimate(SEE)SD of Y Prediction Errors The SEE is the SD of the prediction errors (residuals) when predicting Y from X. SEE is used to make a confidence interval for the prediction equation.
Linear Regression: Statistics Enter the variables Click Statistics Button
Linear Regression: Output 71.5% percent of the variance in Y is explained by X. Correlation (r) r = .845 between X and Y.
Regression Output Prediction Equation Y = .726 (X) + 12.859 95% CI Y = .726 (X) + 12.859 ± 1.96 (6.06)
The SEE is used to compute confidence intervals for prediction equation.
Example of a 95% confidence interval. Both r and SDY are critical in accuracy of prediction. If SDY is small and r is big, predictions are will be small. If SDY is big and r is small, predictions are will be large. We are 95% sure the mean falls between 45.1 and 67.3
Multiple Regression • Multiple regression is used to predict one Y (dependent) variable from two or more X (independent) variables. • The advantage of multivariate or bivariate regression is • Provides lower standard error of estimate • Determines which variables contribute to the prediction and which do not.
Multiple Regression • b1, b2, b3, … bn are coefficients that give weight to the independent variables according to their relative contribution to the prediction of Y. • X1, X2, X3, … Xn are the predictors (independent variables). • C is a constant, similar to Y intercept. • Body Fat = Abdominal + Tricep + Thigh
List the variables and order to enter into the equation • X2 has biggest area (C), it comes in first. • X1 comes in next area (A) is bigger than area (E). Both A and E are unique, not common to C. • X3 comes in next, it uniquely adds area (E). • X4 is not related to Y so it is NOT in the equation.
Ideal Relationship Between Predictors and Y Each variable accounts for unique variance in Y Very little overlap of the predictors Order to enter? X1, X3, X4, X2, X5