1 / 7

Correlation

Correlation. We can often see the strength of the relationship between two quantitative variables in a scatterplot, but be careful. The two figures here are both of the same data , on different scales. The second seems to be a stronger association….

tekla
Download Presentation

Correlation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Correlation • We can often see the strength of the relationship between two quantitative variables in a scatterplot, but be careful. The two figures here are both of the same data, on different scales. The second seems to be a stronger association…

  2. Here’s a formula for Pearson’s correlation coefficient: • This formula is not for computing r but for understanding r. Notice that the first step in this formula involves standardizing each x and y value and then multiplying the two standardized values (how many s.d.s above or below the means the x’s and y’s are...) together. • When two variables x and y are positively associated their standardized values tend to be both positive or both negative (think of height and weight) so the product is positive. • When two variables are negatively associated then if x for example is above the mean, the y tends to be below the mean (and vice versa) so the product is negative.

  3. The correlation coefficient, r, is a numerical measure of the strength of the linear relationship between two quantitative variables. • It is always a number between -1 and +1. Positive r positive association Negative r negative association • r=+1 implies a perfect positive relationship; points falling exactly on a straight line with positive slope • r=-1 implies a perfect negative relationship; points falling exactly on a straight line with negative slope • r~0 implies a very weak linear relationship

  4. Correlation makes no distinction between explanatory & response variables – doesn’t matter which is which… • Both variables must be quantitative • r uses standardized values of the observations, so changing scales of one or the other or both of the variables doesn’t affect the value of r. • r measures the strength of the linear relationship between the two variables. It does not measure the strength of non-linear or curvilinear relationships, no matter how strong the relationship is… • r is not resistant to outliers – be careful about using r in the presence of outliers on either variable

  5. To explore how extreme outlying observations influence r, see the applet on Correlation and Regression at whfreeman.com/ips6e . • Homework: • Reading 2.1 • Use R to scatterplot, add different characters for a "lurking variable", compute correlation coefficient, compute slope and intercept of the regression line, plot regression line on the scatterplot (see next page for some code to do all this…) • HW: On page 16 of Reading & Problems 2.1, do problems # 4.3, 4.7, 4.9 using R. Also, look at the UN data on GDP and CO2 emissions: plot, correlate, regress… DESCRIBE/EXPLAIN WHAT YOU FIND!

  6. plot(x,y) # gives a scatterplot of y (vertical) on #x (horizontal) To add a different plotting #character, use the pch= option as in plot(x,y,pch=15) #(or try different numbers) #or plot(x,y,pch="x") # or plot(x,y,pch=as.numeric(sex)) plot(x,y,pch=15,cex=1.5) #cex=1.5 makes the plotting #characters 1.5 times as big as default characters cor(x,y) #gives the Pearson correlation coefficient # denoted by r between x and y lm(y~x) #gives the least squares linear regression # of y on x abline(lm(y~x)) #draws the regression line on a #scatterplot (that's already drawn) summary(lm(y~x)) # shows more detail about the #slope and intercept.

More Related