1 / 13

Lesson 3 - 3

Lesson 3 - 3. Correlation and Regression Wisdom. Knowledge Objectives. Recall the three limitations on the use of correlation and regression. Explain what is meant by an outlier in bivariate data. Explain what is meant by an influential observation and how it relates to regression.

leo-murphy
Download Presentation

Lesson 3 - 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lesson 3 - 3 Correlation and Regression Wisdom

  2. Knowledge Objectives • Recall the three limitations on the use of correlation and regression. • Explain what is meant by an outlier in bivariate data. • Explain what is meant by an influential observation and how it relates to regression. • Define a lurking variable. • Give an example of what it means to say “association does not imply causation.”

  3. Construction Objectives • Given a scatterplot in a regression setting, identify outliers and influential observations • Explain how correlations based on averages differ from correlations based on individuals

  4. Vocabulary • Influential Observation – an observation that if removed would markedly change the result of the regression calculation

  5. Limitations • Correlation and regression describe only linear relationships • Extrapolation (using model outside range of the data) often produces unreliable predications • Correlation is not resistant (to outliers!)

  6. Outliers vs Influential Observation • Outlier is an observation that lies outside the overall pattern of the other observations • Outliers in the Y direction will have large residuals. but may not influence the slope of the regression line • Outliers in the X direction are often influential observations • Influential observation is one that if by removing it, it would markedly change the result of the regression calculation

  7. Example 1 Does the age at which a child begins to talk predict later score on a test of metal ability? A study of the development of 21 children recorded the age in months at which they spoke their first word and their later Gesell Adaptive Score (GAS).

  8. Example 1 cont • What is the equation of the LS regression line used to model this data? • What is the interpretation of this data? y-hat = 109.8738 – 1.127x r = -0.64 The scatter plot and the slope of the regression line indicates a negative association. Children who begin to speak later tend to have lower test scores than early talkers. The slope suggests that for every month older a child is when they begin to speak, their score on the Gesell test will decrease by about 1.13 points. The y-intercept has no real meaning in this case.

  9. Example 1 cont • Are there any outliers? • Are there any influential observations? Child #19 is an outlier in the Y-direction and child #18 is an outlier in the X-direction. Child #18 is an outlier in the X-direction and also an influential observation because it has a strong influence on the positioning of the regression line.

  10. Example 1 cont Scatterplot w/ Regression Line Residual Plot

  11. Lurking or Extraneous Variable • The relationship between two variables can often be misunderstood unless you take other variables into account • Association does not imply causation! • Instances of Rocky Mt spotted fever and drownings reported per month are highly correlated, but completely without causation

  12. Remember Sampling Distributions • When we looked at individual values, they had much broader spreads (variances) than when we looked at the distributions of x-bar • Same is true with correlations based on averaged data – strong correlations may exist between averages, but individuals will have much greater variances • Correlations based on averages are usually too high when applied to individuals.

  13. Summary and Homework • Summary • Correlation and regression must be interpreted with caution • Plot data to be sure that the relationship is roughly linear and to detect outliers • Check for influential observations that substantially change the regression line • Lurking variables may explain the relationship between the explanatory and response variables • Homework • pg 242-3 3.63-67

More Related