650 likes | 745 Views
Chapter 7 -Part 1 . Correlation. Correlation Topics. Correlational research – what is it and how do you do “co-relational” research? The three questions: Is it a linear or curvilinear correlation? Is it a positive or negative relationship? How strong is the relationship?
E N D
Chapter 7 -Part 1 Correlation
Correlation Topics • Correlational research – what is it and how do you do “co-relational” research? • The three questions: • Is it a linear or curvilinear correlation? • Is it a positive or negative relationship? • How strong is the relationship? • Solving these questions with t scores and r, the estimated correlation coefficient derived from the tx and ty scores of individuals in a random sample.
Correlational research – how to start. • To begin a correlational study, we select a population or, far more frequently, select a random sample from a population. • (Since we use samples most of the time, for the most part, we will use the formulae and symbols for computing a correlation from a sample.) • We then obtain two scores from each individual, one score on each of two variables. These are usually variables that we think might be related to each other for interesting reasons). We call one variable X and the other Y.
Correlational research: comparing tX & tY scores • We translate the raw scores on the X variable to t scores (called tX scores) and raw scores on the Y variable to tY scores. • So each individual has a pair of scores, a tX score and a tY score. • You determine how similar or different the tX and tY scores in the pairs are, on the average, by subtracting tY from tX, then squaring, summing, and averaging the tX and tY differences.
The estimated correlation coefficient, Pearson’s r • With a simple formula, you transform the average squared differences between the t scores to Pearson’s correlation coefficient, r • Pearson’s r indicates (with a single number), both the direction and strength of the relationship between the two variables in your sample. • r also estimates the correlation in the population from which the sample was drawn • In Ch. 8, you will learn when you can use r that way.
Going from pairs of raw scores to r: Linearity - A preliminary question. • Once you have scores on two variables, you ask, “Is this a linear or curvilinear relationship?” • Psychology is a relatively new science and this is an intro stat course • For both reasons, you will only learn how to deal with linear relationships between two variables and save correlation with three or more variables and curvilinear relationships for grad school. BUT YOU MUST KNOW WHAT A LINEAR RELATIONSHIP IS, AND HOW TO RECOGNIZE A NONLINEAR (CURVILINEAR) CORRELATION.
Linearity vs. Curvilinearity • In a linear relationship, as scores on one variable go from low to high, scores on the other variable either generally increase or generally decrease. In a curvilinear relationship, as scores on one variable go from low to high, scores on the other variable change directions. They can go 1.)down and then up, 2.) up and then down, 3.) up and down and then up again, 4.) up or down then flat. ETC.
Examples of linear relationships. • For example, think of the relationship of the size of a pleasure boat (X) and its cost (Y). As one variable (boat size) increases, scores on the other variable (cost) also increase. • Another example of a linear relationship: the relationship between the size of a car and the number of miles per gallon it gets. In general, as cars get gradually larger (X), they tend to get fewer miles per gallon (Y).
A curvilinear relationship • In a curvilinear relationship, as scores on the X variable go gradually from low to high, the Y variable changes direction. • For example, think of the relationship between age (X) and height (Y). • As age increases from 0-14 or so, height increases also. • But then people stop growing. As age increases, height stays the same. • Thus the Y variable, height, changes direction. It goes from gradually rising to flat. • If you graph age and height, the best fitting line is a curved line.
Correlation Characteristics: Which line best shows the relationship between age (X) and height (Y) Linear vs Curvilinear
Baseball skill Terrible Very Poor Poor Average Good Very Good Excellent David Ben Ed Frank Chuck Al George Another non-linear relationship: shortstops and linemen: great shortstops may be too small to be great football lineman. Football potential Terrible Average Average Very Good Excellent Good Poor Is this a linear relationship?
Plot the dots! • To check whether a relationship is linear, make a graph and place the scores on it. • That’s what I mean by “Plot the dots.” • If you really want to know what is going on with data, Plot the dots! • Here is a graph for the baseball skills and football potential data.
Excellent Very Good Good Terrible Very Poor Poor Average Good Very Good Excellent Average Poor Very Poor Terrible When you plot the dots, is this linear? Football Skill * Chuck * Frank * Al Baseball Skill * Ben * Ed * George NO! It is best described by a curved line. It is a curvilinear relationship! * David
After you know a correlation is linear, there are other two questions: Direction and Strength of a correlation. But first, a definition of high and low scores. • Definition of high and low scores: • High scores are scores above the mean. They are represented by positive t scores. • Low scores are scores below the mean of each variable. They are represented by negative t scores.
Positive relationships • In a positive relationship, as X scores gradually increase, Y scores tend to increase as well. Example: The longer a sailboat is, the more it tends to cost. As length goes up, price tends to go up. • In a positive correlation, X and Y scores tend to be on the same side of their respective means. • As a result, the tX and tY scores tend to be similar and the difference between them (tX – tY) tends to be small. • Since (tX – tY) is small, the squared difference between them, (tX – tY)2 also tends to be small
Graphing a positive relationship. • In a positive correlation high scores on X tend to go with high scores on Y. On a graph, as the line runs from left to right, scores increase on the X axis. At the same time, Y scores also generally get higher. So, the line will tend to rise as it runs. • Remember from math, slope equals how far a line rises on the Y axis for each unit it moves from left to right or “runs” along the X axis. • If a line rises from left to right, “rise” is positive. Run is always positive. So a positive rise divided by an (always) positive run results in a positive slope. (That’s why we call it a “positive” correlation.)
3 Positive relationship 2 1 -3 -2 -1 0 1 2 3 0 -1 Negative relationship -2 -3 Positive vs Negative scatterplot
3 2 1 -3 -2 -1 0 1 2 3 0 -1 -2 -3 Graphic display of a strong POSITIVE correlation.
Negative relationships • In a negative relationship, as X scores gradually increase, Y scores tend to decrease. Example: The more years a sailboat is used, the less it tends to cost. As use goes up, price tends to go down. • In a negative correlation, X and Y scores tend to be on opposite sides of their respective means. • As a result, the tX and tY scores tend to be dissimilar and the difference between them (tX – tY) tends to be large. • Since (tX – tY) is large, the squared difference between them, (tX – tY)2 also tends to be large.
Graphing a negative relationship • In a negative correlation, high scores on X tend to go with low scores on Y. On a graph, as the line runs from left to right, scores increase on the X axis. At the same time, Y scores get lower. So, the line will tend to fall as it runs. • Remember from math, slope equals how far a line rises on the Y axis for each unit it moves from left to right or “runs” along the X axis. • If a line falls from left to right, “rise” is negative. Run is always positive. So a negative rise divided by an (always) positive run results in a negative slope. (That’s why we call it a “negative” correlation.)
3 Positive relationship 2 1 -3 -2 -1 0 1 2 3 0 -1 Negative relationship -2 -3 Positive vs Negative scatterplot
Summary: • When t scores are consistently more similar than different, we have a positive correlation. On a graph the dots will rise from your left to your right. • When t scores are consistently more different than similar, we have a negative correlation. On a graph the dots will fall from your left to your right.
3 Positive relationship 2 1 -3 -2 -1 0 1 2 3 0 -1 Negative relationship -2 -3 Positive vs Negative scatterplot
How strong is the relationship between the tX and tY scores? • Here the question is about the consistency with which tX and tY scores are either similar or dissimilar.
t scores: sign and size • There are two aspects to the consistency of the relationship between tX and tY scores. • First, are the t scores consistently of the same sign (positive correlation) or opposite signs (negative correlation). • If they are almost always one way or the other, you have at least a moderately strong relationship. • On the other hand, if you sometimes see t scores on the same side of the mean and sometimes on opposite sides, you have a relatively weak correlation.
t scores: sign and size • If there is a consistent pattern of same signed t scores (positive correlation) or a consistent pattern of opposite signed t scores (negative correlation), then whether the tX and tY scores are about the same distance from the mean comes into play. • The large majority of t scores (usually well over 95%, range from –2.50 to + 2.50 • Given a consistent positive or negative correlation, the more similar in size the t scores, the stronger the correlation.
Positive correlations: • Perfect: tX and tY scores are all the same sign and are identical in size. • Strong: tX and tY scores are almost all the same sign and are fairly similar in size. • Moderate: tX and tY scores are predominately the same sign. This is especially true for pairs in which one of the values is one or more standard deviations from the mean. Size may be fairly dissimilar. • Weak: tX and tY scores are a little more often the same sign than opposite in sign. Nothing can be said about size.
Negative correlations: • Perfect: tX and tY scores are all of the opposite sign and are identical in size. • Strong: tX and tY scores are almost all of opposite sign and are fairly similar in size. • Moderate: tX and tY scores are predominately opposite in sign. This is especially true for pairs in which one of the values is one or more standard deviations from the mean. Size may be fairly dissimilar. • Weak: tX and tY scores are a little more often of opposite signs than the same in sign. Nothing can be said about size.
Unrelated (independent) variables • When the size and sign of the tX scores bears no relationship to the size and sign of the tY scores, the variables are unrelated. • We also can call the variables “independent of” or “orthogonal to” each other. The three terms, unrelated, independent and orthogonal are synonymous in this context.
Graphing it on taxes: The strength of a relationship tells us approximately how the dots representing pairs of t scores will fall around a best fitting line. • Perfect - scores fall exactly on a straight line whose slope will be +1.00 or –1.00. • Strong - most scores fall near the line whose slope will be close to +.750 or -.750. • Moderate - some are near the line, some not. The slope of the line will be close to +.500 or -.500.
Graphing it on taxes: The strength of a relationship tells us approximately how the dots representing pairs of t scores will fall around a best fitting line. • Weak – some scores fall fairly close to the line, but others fall quite far from it. The slope of the line will be close to +.250 or -.250 • Independent - the scores are not close to the line and form a circular or square pattern. The best fitting line will be the X axis, a line with a slope of 0.000.
1.5 Perfect 1.0 0.5 -1.5 -1.0 -0.5 0 0.5 1.0 1.5 0 -0.5 -1.0 -1.5 Strength of a relationship
3 2 Very Strong 1 -3 -2 -1 0 1 2 3 0 -1 -2 -3 Strength of a relationship
3 2 1 -3 -2 -1 0 1 2 3 0 -1 -2 -3 Strength of a relationship Moderate
3 2 1 -3 -2 -1 0 1 2 3 0 -1 -2 Independent -3 Strength of a relationship
3 2 1 -3 -2 -1 0 1 2 3 0 -1 -2 -3 What is this relationship?
3 2 1 -3 -2 -1 0 1 2 3 0 -1 -2 -3 What is this?
3 2 1 -3 -2 -1 0 1 2 3 0 -1 -2 -3 What is this?
3 2 1 -3 -2 -1 0 1 2 3 0 -1 -2 -3 What is this?
Comparing apples to oranges? Use Z or t scores! • You can use correlation to look for the relationship between ANY two values that you can measure of a single subject. • However, there may not be any relationship (independent). • A correlation tells us if scores are consistently similar on two measures, consistently different from each other, or have no real pattern
Comparing apples to oranges? Use t scores! • To compare scores on two different variables, you transform them into ZX and ZY scores if you are studying a population or tX and tY scores if you have a sample. • ZX and ZY scores (or tX and tY scores) can be directly compared to each other to see whether they are consistently similar, consistently quite different, or show no consistent pattern of similarity or difference
Comparing variables • Anxiety symptoms, e.g., heartbeat, with number of hours driving to class. • Hat size with drawing ability. • Math ability with verbal ability. • Number of children with IQ. • Turn them all into Z or t scores
Pearson’s Correlation Coefficient • coefficient - noun, a number that serves as a measure of some property. • The correlation coefficient indexes BOTH the consistency and direction of a correlation with a single number
Pearson’s rho • Pearson’s rho () is the parameter that characterizes the strength and direction of a linear relationship (and only a linear relationship) between two variables. To compute rho, you must have the entire population. Then you can compute sigma, mu, Z scores and rho. • The formula: rho= 1 -(1/2 (ZX - ZY)2 / (NP)) where NP is the number of pairs of Z scores in the population • In English: The correlation coefficient equals 1 minus half the average squared distance between the Z scores.
Pearson’s rho • When you have a perfect positive correlation, the Z scores will be identical in size and sign. So the average squared distance will be zero and rho = 1.000-1/2(0.000) = 1.000 • When you have a perfect negative correlation, the Z scores will be identical in size and opposite in sign. It can be proven algebraically that the average squared distance in that case will be 4.000: rho = 1.000-1/2(4.000) = -1.000 • When you have two totally independent variables, the average squared distance will be 2.000 (halfway between 0.000 and 4.000). Thus, rho = 1.000-1/2(2.000) = 0.000
Pearson’s Correlation Coefficient • Thus, rho varies from -1.000 (perfect negative correlation to 0.000 (independent variables) to +1.000 (perfect positive correlation). • A negative value indicates a negative relationship; a positive value indicates a positive relationship. • Values of r close to 1.000 or -1.000 indicate a strong (consistent) relationship; values close to 0.000 indicate a weak (inconsistent) or independent relationship.
Estimating rho with r • Computing rho involves finding the actual average squared distance between the ZX and ZY scores in the whole population. • In computing r, we are estimating rho.