1 / 33

Chapter 6 Association between Quantitative Variables

Chapter 6 Association between Quantitative Variables. 6.1 Scatterplots. Is household natural gas consumption associated with climate? Annual household natural gas consumption measured in thousands of cubic feet (MCF)

bayle
Download Presentation

Chapter 6 Association between Quantitative Variables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 6Association between Quantitative Variables

  2. 6.1 Scatterplots • Is household natural gas consumption • associated with climate? • Annual household natural gas consumption measured in thousands of cubic feet (MCF) • Climate as measured by the National Weather Service using heating degree days (HDD)

  3. 6.1 Scatterplots • Association between Numerical Variables • A graph displaying pairs of values as points on a two-dimensional grid • The explanatory variable is placed on the x-axis • The response variable is placed on the y-axis

  4. 6.1 Scatterplots • Scatterplot of Natural Gas Consumption (y) • versus Heating Degree-Days (x)

  5. 6.2 Association in Scatterplots • Visual Test for Association • Compare the original scatterplot to others that randomly match the coordinates • If you can pick the original out as having a pattern, then there is an association

  6. 6.2 Association in Scatterplots • Describing Association 1. Direction. Does it trend up or down? 2. Curvature. Is the pattern linear or curved? 3. Variation. Are the points tightly clustered around the trend? 4. Outliers. Is there something unexpected?

  7. 6.2 Association in Scatterplots • Gas Consumption vs. Heating Degree Days 1. Direction: Positive. 2. Curvature: Linear. 3. Variation: Considerable scatter. 4. Outliers: None apparent.

  8. 6.3 Measuring Association • Covariance • A measure that quantifies the linear association • Depends on units of measurement and is therefore difficult to interpret

  9. 6.3 Measuring Association • Calculating the Covariance (for n = 6 homes)

  10. 6.3 Measuring Association • Correlation (r) • Standardized measure of the strength of the linear association (has no units) • Always between -1 and +1 • Easy to interpret

  11. 6.3 Measuring Association • Gas Consumption and Heating Degree Days • Cov (HDD, Gas) = 56,308.9 HDD X MCF • Corr (HDD, Gas) = 0.58 • The association is positive and moderate.

  12. 6.3 Measuring Association • Scatterplot for r = 1

  13. 6.3 Measuring Association • Scatterplot for r = -0.95

  14. 6.3 Measuring Association • Scatterplot for r = 0.75

  15. 6.3 Measuring Association • Scatterplot for r = -0.50

  16. 6.3 Measuring Association • Scatterplot for r = 0

  17. 6.3 Measuring Association • Correlation Size • Depends on context • Correlations between macroeconomic variables often approach 1 • Smaller correlations are typical for behavioral data

  18. 6.3 Measuring Association • Macroeconomic Variables

  19. 6.3 Measuring Association • Consumer Behavior Variables

  20. 6.4 Summarizing Association with a Line • Expressed using z-scores • Slope-Intercept Form • with and

  21. 6.4 Summarizing Association with a Line • Line Relating Gas Consumption (y) to • Heating Degree Days (x)

  22. 6.4 Summarizing Association with a Line • Lines and Prediction • Use the correlation line to customize an ad for estimated savings from insulation based on climate. • For a home in a cold climate (HDD = 8,800), the predicted gas consumption is 141 MCF. • At $10 / MCF, the predicted cost is $1,410. • Assuming that insulation saves 30% on gas bill, estimated savings is $423.

  23. 6.4 Summarizing Association with a Line • Nonlinear Patterns • If the association is not linear, a line may be a poor summary of the pattern • Covariance and correlation measure only linear association • Inspect the scatterplot before relying on these statistics to measure association

  24. 6.5 Spurious Correlation • Lurking Variables • Scatterplots and correlation reveal association, not causation • Spurious correlations result from underlying lurking variables

  25. 6.5 Spurious Correlation • Checklist: Covariance and Correlation • Numerical variables • No obvious lurking variables • Linear • Outliers

  26. 4M Example 6.1: LOCATING A NEW STORE • Motivation • Is it better to locate a new retail outlet far from competing stores?

  27. 4M Example 6.1: LOCATING A NEW STORE • Method • Is there an association between sales at the retail • outlets and distance to nearest competitor? For 55 • stores in the chain, data are gathered for total • sales in the prior year and distance in miles from • the nearest competitor.

  28. 4M Example 6.1: LOCATING A NEW STORE • Mechanics

  29. 4M Example 6.1: LOCATING A NEW STORE • Mechanics • Compute the correlation between sales and • distance to be r = 0.741

  30. 4M Example 6.1: LOCATING A NEW STORE • Message • The data show a strong, positive linear association • between distance to the nearest competitor and • sales. It is better to locate a new store far from its • competitors.

  31. Best Practices • To understand the relationship between two numerical variables, start with a scatterplot. • Look at the plot, look at the plot, look at the plot. • Use clear labels for the scatterplot.

  32. Best Practices (Continued) • Describe a relationship completely. • Consider the possibility of lurking variables. • Use a correlation to quantify the association between two numerical variables that are linearly related.

  33. Pitfalls • Don’t use the correlation if data are categorical. • Don’t treat association and correlation as causation. • Don’t assume that a correlation of zero means that the variables are not associated. • Don’t assume that a correlation near -1 or +1 means near perfect association.

More Related