160 likes | 329 Views
The Correlation Coefficient. Social Security Numbers. A Scatter Diagram. The Point of Averages. Where is the center of the cloud? Take the average of the x -values and the average of the y -values; this is the point of averages . It locates the center of the cloud.
E N D
The Point of Averages • Where is the center of the cloud? • Take the average of the x-values and the average of the y-values; this is the point of averages. • It locates the center of the cloud. • Similarly, take the SD of the x-values and the SD of the y-values.
The Correlation Coefficient • An association can be stronger or weaker. • Remember: a strong association means that knowing one variable helps to predict the other variable to a large extend. • The correlation coefficient is a numerical value expressing the strength of the association.
The Correlation Coefficient • We denote the correlation coefficient by r. • If r = 0, the cloud is completely formless; there is no correlation between the variables. • If r = 1, all the points lie exactly on a line (not necessarily x = y) and there is perfect correlation.
The Correlation Coefficient • What about negative values? • The correlation coefficient is between –1 and 1, negative shows negative association, positive indicates positive association. • Note that –0.90 shows the same degree of association as +0.90, only negative instead of positive.
Computing the Correlation Coefficient • Convert each variable to standard units. • The average of the products gives the correlation coefficient r. r = average of (x in standard units) (y in standard units)
Example We mustfirstconvert to standard units. Find the average and the SD of the x-values: average = 4, SD = 2. Find the deviation: subtract the average from each value, and divide by the SD. Then do the same for the y-values.
Example • Finally, take the average of the products • In this example, r = 0.40. r = average of (x in standard units) (y in standard units)
The SD line • If there is some association, the points in the scatter diagram cluster around a line. But around which line? • Generally, this is the SDline. It is the line through the point of averages. • It climbs at the rate of one vertical SD for each horizontal SD. • Its slope is (SD of y) / (SD of x) in case of a positive correlation, and –(SD of y) / (SD of x) in case of a negative correlation.
Five-point Summary • Remember the five-point summary of a data set: minimum, lower quartile, median, upper quartile, and maximum. • A five-point summary for a scatter plot is: average x-values, SD x-values, average y-values, SD y-values, and correlation coefficient r.