370 likes | 531 Views
r xy. r xy. When two variables are correlated, we can predict a score on one variable from a score on the other The stronger the correlation, the more accurate our prediction will be. r xy. We need a measure of the “strength” of a correlation. r xy.
E N D
rxy • When two variables are correlated, we can predict a score on one variable from a score on the other • The stronger the correlation, the more accurate our prediction will be
rxy • We need a measure of the “strength” of a correlation
rxy • We need a number that gets bigger when big numbers are paired with big numbers and small numbers are paired with small numbers • We need a number that gets smaller when big numbers are paired with small numbers and small numbers are paired with big numbers
5’ 5’2 5’4 5’6 5’8 5’10 rxy • Remember the height/weight example: • Big number indicates this (strong positive correlation) f c d a b, e 100 110 120 130 140 150 b c d a e f
5’ 5’2 5’4 5’6 5’8 5’10 rxy • Remember the height/weight example: • Small number indicates this (strong negative correlation) f c d a b, e 100 110 120 130 140 150 d f e b a c
rxy • Two sets of scores, xi and yi • What could we do?
rxy • What could we do?
rxy • What could we do? • When pairs are multiplied and the products are summed up: • Greatest when big numbers paired with big numbers and small numbers with small numbers • Least when small numbers are paired with big numbers and big numbers are paired with small numbers
rxy • analogy: This gets you most money Pennies Quarters Loonies
rxy • analogy:this gets you the least… Pennies Quarters Loonies
rxy • analogy: Because: 3 x $1 plus 2 x $0.25 plus 1 x $0.01 is more than 1 x $1 plus 2 x $0.25 plus 3 x $0.01
rxy • But there’s a problem Not a good measure because the value ultimately depends on n AND the size of the numbers
rxy • Try this
rxy • Try this Still not so good - doesn’t depend on n anymore, but does depend on size of x’s and y’s
rxy • How about multiply deviation scores • comparing each variable relative to its respective mean
rxy • Multiply deviation scores Now value depends on the spread of the data
rxy • So standardize the scores
rxy • This measures strength of correlation: = rxy =
rxy • rxy ranges from -1.0 indicating a perfect negative correlation to +1.0 indicating a perfect positive correlation • an rxy of zero indicates no correlation whatsoever. Scores are random with respect to each other.
rxy • rxy also has a geometric meaning
rxy • rxy also has a geometric meaning • Recall that the mean of the zx and zy distributions is zero and each z-score is a deviation from the mean
rxy • Each point lands in one of four quadrants point zx,zy zy zx
rxy • notice that: rxy = both zx and zy are positive
rxy • notice that: rxy = zx is negative and zy is positive
rxy • notice that: rxy = zx is negative and zy is negative
rxy • notice that: rxy = zx is positive and zy is negative
rxy • So Thus if most points tend to fall around a line with a positive (45 degree) slope (I and III), the cross-products will tend to be positive I II IV III
rxy • So Thus if most points tend to fall around a line with a positive (45 degree) slope (I and III), the cross-products will tend to be positive I II If most points tend to fall around a line with a negative slope (II and IV), the cross products will tend to be negative IV III
rxy • So If the points were randomly scattered about, the negative and positive cross-products cancel
Covariance • a related measure of the relationship between scores on two different variables is the covariance
Covariance • notice that the variance (S2x) is the covariance between a variable and itself !
Regression • If two variables are perfectly correlated (r = + or - 1.0) then one can exactly predict a score on one variable given a score on another
Regression • For example: a university charges $250 registration fee plus $100 / credit
Regression • tuition = $100(X) + $250 • where X is the number of credits • Notice this is a linear relationship (an equation of the form y = ax + b • a = $100/credit • b = $250 • x = number of credits
Regression • Tuition as a function of credit hours is a straight line • There is a perfect correlation between credit hours and tuition • You could predict perfectly the tuition required given the number of credit hours
Next Time • Regression - read chapter 8