1 / 10

Section 7.2 ~ Interpreting Correlations

Section 7.2 ~ Interpreting Correlations. Introduction to Probability and Statistics Ms. Young ~ room 113. Objective. Sec. 7.2.

river
Download Presentation

Section 7.2 ~ Interpreting Correlations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Section 7.2 ~ Interpreting Correlations Introduction to Probability and Statistics Ms. Young ~ room 113

  2. Objective Sec. 7.2 • After this section you will be aware of important cautions concerning the interpretation of correlations, especially the effects of outliers, the effects of grouping data, and the crucial fact that correlation does not necessarily imply causality.

  3. Sec. 7.2 Beware of Outliers • When examining a scatterplot to determine correlation, be aware of any outliers • They can greatly affect the correlation coefficient, possibly resulting in a misleading conclusion about the relationship between the variables • The scatterplot below has an outlier located in the top right • With the outlier included, r = 0.880, which represents a very strong positive correlation • If you calculate the correlation coefficient without the outlier it is 0, which represents absolutely no correlation • Even though outliers can mask the correlation, you should not remove them without having a strong reason to believe that they do not belong in the data set

  4. Sec. 7.2 Example 1 ~ Masked Correlation • You’ve conducted a study to determine how the number of calories a person consumes in a day correlates with time spent in vigorous bicycling. Your sample consisted of ten women cyclists, all of approximately the same height and weight. Over a period of two weeks, you asked each woman to record the amount of time she spent cycling each day and what she ate on each of those days. You used the eating records to calculate the calories consumed each day. The diagram below shows each woman’s mean time spent cycling on the horizontal axis and mean caloric intake on the vertical axis. Do higher cycling times correspond to higher intake of calories?

  5. Sec. 7.2 Example 1 ~ Solution • If you look at the data as a whole, your eye will probably tell you that there is a positive correlation in which greater cycling time tends to go with higher caloric intake. But the correlation is very weak, with a correlation coefficient of 0.374 • However, notice that two points are outliers: one representing a cyclist who cycled about a half-hour per day and consumed more than 3,000 calories, and the other representing a cyclist who cycled more than 2 hours per day on only 1,200 calories • It’s difficult to explain the two outliers, given that all the women in the sample have similar heights and weights. We might therefore suspect that these two women either recorded their data incorrectly or were not following their usual habits during the two-week study. If we can confirm this suspicion, then we would have reason to delete the two data points as invalid. • The correlation is quite strong without those two outlier points, and suggests that the number of calories consumed rises by a little more than 500 calories for each hour of cycling, but we should not remove the outliers without confirming our suspicion that they were invalid data points, and we should report our reasons for leaving them out.

  6. Sec. 7.2 Beware of Inappropriate Grouping • Sometimes grouping data inappropriately can hide correlations • Data may appear to have no correlation, but when grouped differently, a correlation is apparent • Ex. ~ Consider a study in which researchers seek a correlation between hours of TV watched per week and high school grade point average (GPA). They collect the 21 data pairs in Table 7.3. • The scatterplot shows virtually no correlation, and the correlation coefficient equals -0.063 • The apparent conclusion is that TV viewing habits are unrelated to academic achievement

  7. Sec. 7.2 Beware of Inappropriate Grouping Cont’d… • However, after further investigation, one astute researcher realizes that some of the students watched mostly educational programs, while others tended to watch comedies, dramas, and movies. • She therefore divides the data set into two groups, one for the students who watched mostly educational television and one for the other students.

  8. Sec. 7.2 Beware of Inappropriate Grouping Cont’d… • After graphing each of the groups separately, we find two very strong correlations: • A strong positive correlation for the students who watched educational programs (r = 0.855) • A strong negative correlation for the other students (r = -0.951).

  9. Sec. 7.2 Beware of Inappropriate Grouping Cont’d… • Sometimes data may appear to have a correlation, but when grouped differently there is no correlation • Ex. ~ Consider the data collected by a consumer group studying the relationship between the weights and prices of cars. • The data set as a whole shows a strong positive correlation (r = 0.949) • After closer examination, you can see that there are two rather distinct categories; light cars and heavy cars • If you analyze the light cars alone, r = 0.019 (nearly no correlation) • If you analyze the heavy cars alone, r = -0.022 (nearly no correlation) • This false correlation occurred because of the separation between the two clusters

  10. Sec. 7.2 Correlation Does Not Imply Causality • Just because numbers tell us that there is a correlation between two variables, it does not mean that it is necessarily true • In other words, “correlation does not imply causality”, or one variable does not necessarily cause the other one • Here are some possible explanations for a correlation • The correlation may be a coincidence • Ex. ~ Super Bowl and the stock market (refer to ex. 2 on P.303) • Both correlated variables might be directly influenced by some common underlying cause • Ex. ~ As eggnog sales increase in Pennsylvania, accident rates increase as well; the underlying cause would be that eggnog is typically sold in the winter and accidents are more common in the winter due to inclement weather • One of the correlated variables may actually be a cause of the other, but it may just be one of several causes • Ex. ~ There is a correlation between smoking and lung cancer, but smoking is not the only way one can get lung cancer

More Related