330 likes | 392 Views
Correlation and Regression. A BRIEF overview. Correlation Coefficients. Continuous IV & DV or dichotomous variables (code as 0-1) mean interpreted as proportion Pearson product moment correlation coefficient range -1.0 to +1.0. Interpreting Correlations.
E N D
Correlation and Regression A BRIEF overview
Correlation Coefficients • Continuous IV & DV • or dichotomous variables (code as 0-1) • mean interpreted as proportion • Pearson product moment correlation coefficient range -1.0 to +1.0
Interpreting Correlations • 1.0, + or - indicates perfect relationship • 0 correlations = no association between the variables • in between - varying degrees of relatedness • r2 as proportion of variance shared by two variables • which is X and Y doesn’t matter
Positive Correlation • regression line is the line of best fit • With a 1.0 correlation, all points fall exactly on the line • 1.0 correlation does not mean values identical • the difference between them is identical
Negative Correlation • If r=-1.0 all points fall directly on the regression line • slopes downward from left to right • sign of the correlation tells us the direction of relationship • number tells us the size or magnitude
Zero correlation • no relationship between the variables • a positive or negative correlation gives us predictive power
Correlation Coefficient • r = Pearson Product-Moment Correlation Coefficient • zx = z score for variable x • zy = z score for variable y • N = number of paired X-Y values • Definitional formula (below)
Interpreting correlation coefficients • comprehensive description of relationship • direction and strength • need adequate number of pairs • more than 30 or so • same for sample or population • population parameter is Rho (ρ) • scatterplots and r • more tightly clustered around line=higher correlation
Examples of correlations • -1.0 negative limit • -.80 relationship between juvenile street crime and socioeconomic level • .43 manual dexterity and assembly line performance • .60 height and weight • 1.0 positive limit
Describing r’s • Effect size index-Cohen’s guidelines: • Small – r = .10, Medium – r = .30, Large – r = .50 • Very high = .80 or more • Strong = .60 - .80 • Moderate = .40 - .60 • Low = .20 - .40 • Very low = .20 or less • small correlations can be very important
Nonlinearity and range restriction • if relationship doesn't follow a linear pattern Pearson r useless • r is based on a straight line function • if variability of one or both variables is restricted the maximum value of r decreases
Simple linear regression • enables us to make a “best” prediction of the value of a variable given our knowledge of the relationship with another variable • generate a line that minimizes the squared distances of the points in the plot • no other line will produce smaller residuals or errors of estimation • least squares property
Regression line • The line will have the form Y'=A+BX • Where: Y' = predicted value of Y • A = Y intercept of the line • B = slope of the line • X = score of X we are using to predict Y
Ordering of variables • which variable is designated as X and which is Y makes a difference • different coefficients result if we flip them • generally if you can designate one as the dependent on some logical grounds that one is Y
Moving to prediction • statistically significant relationship between college entrance exam scores and GPA • how can we use entrance scores to predict GPA?
Calculating the slope (b) • N=number of pairs of scores, rest of the terms are the sums of the X, Y, X2, Y2, and XY columns we’re already familiar with
Calculating Y-intercept (a) • b = slope of the regression line • the mean of the Y values • the mean of the X values
Let’s make up a small example • SAT – GPA correlation • How high is it generally? • Start with a scatter plot • Enter points that reflect the relationship we think exists • Translate into values • Calculate r & regression coefficients