1 / 51

Chapter 2: Looking at Data - Relationships

Chapter 2: Looking at Data - Relationships. http://www.forbes.com/sites/erikaandersen/2012/03/23 /true-fact-the-lack-of-pirates-is-causing-global-warming/. General Procedure. Plot the data. Look for the overall pattern. Calculate a numeric summary.

Download Presentation

Chapter 2: Looking at Data - Relationships

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 2: Looking at Data - Relationships http://www.forbes.com/sites/erikaandersen/2012/03/23 /true-fact-the-lack-of-pirates-is-causing-global-warming/

  2. General Procedure • Plot the data. • Look for the overall pattern. • Calculate a numeric summary. • Answer the question (which will be defined shortly)

  3. 2.1: Relationships - Goals • Be able to define what is meant by an association between variables. • Be able to categorize whether a variable is a response variable or a explanatory variable. • Be able to identify the key characteristics of a data set.

  4. Questions • What objects do the data describe? • What variables are present and how are they measured? • Are all of the variables quantitative? • Are the variables associated with each other?

  5. Association (cont.) Two variables are associated if knowing the values of one of the variables tells you something about the values of the other variable. • Do you want to explore the association? • Do you want to show causality?

  6. Variable Types • Response variable (Y): outcome of the study • Explanatory variable (X): explains or causes changes in the response variable

  7. Key Characteristics of Data • Cases: Identify what they are and how many • Label: Identify what the label variable is (if present) • Categorical or quantitative: Classify each variable as categorical or quantitative. • Values. Identify the possible values for each variable. • Explanatory or Response: Classify each variable as explanatory or response.

  8. 2.2: Scatterplots - Goals • Be able to create a scatterplot (lab) • Be able to interpret a scatterplot • Pattern • Outliers • Form, direction and strength of a relationship • Be able to interpret scatterplots which have categorical variables.

  9. Scatterplot - Procedure • Decide which variable is the explanatory variable and put on X axis. The response variable goes on the Y axis. • Label and scale your axes. • Plot the (x,y) pairs.

  10. Example: Scatterplot The following data is to determine the relationship between age and change in systolic blood pressure (BP, mm Hg) after 24 hours in response to a particular treatment. a) Draw a scatterplot of this data.

  11. Example: Scatterplot (cont) Age Age

  12. Pattern • Form • Direction • Strength • Outliers

  13. No relationship Nonlinear Pattern Linear

  14. Outliers

  15. Example: Scatterplot (cont) Age

  16. Scatterplot with Categorical Variables http://statland.org/Software_Help/Minitab/MTBpul2.htm

  17. 2.3: Correlation - Goals • Be able to use (and calculate) the correlation to describe the direction and strength of a linear relationship. • Be able to recognize the properties of the correlation. • Be able to determine when (and when not) you can use correlation to measure the association.

  18. Sample correlation, r(Pearson’s Sample Correlation Coefficient) X

  19. Sum of Squares X

  20. Properties of Correlation • r > 0 ==> positive association r < 0 ==> negative association • r is always a number between -1 and 1. • The strength of the linear relationship increases as |r| moves to 1. • |r| = 1 only occurs if there is a perfect linear relationship • r = 0 ==> x and y are uncorrelated.

  21. Positive/Negative Correlation

  22. Example: Positive/Negative Correlation 1) Would the correlation between the age of a used car and its price be positive or negative? Why? 2) Would the correlation between the weight of a vehicle and miles per gallon be positive or negative? Why?

  23. Variety of Correlation Values

  24. Value of r

  25. Comments about Correlation • Correlation makes no distinction between explanatory and response variables. • r has no units and does not change when the units of x and y change.

  26. Cautions about Correlation • Correlation requires that both variables be quantitative. • Correlation measures the strength of LINEAR relationships only. • The correlation is not resistant to outliers. • Correlation is not a complete summary of bivariate data.

  27. Datasets with r = 0.816

  28. Questions about Correlation • Does a small r indicate that x and y are NOT associated? • Does a large r indicate that x and y are linearly associated?

  29. 2.4: Least-Squares Regression - Goals • Be able to generally describe the method of ‘Least-Squares Regression’ • Be able to calculate and interpret the regression line. • Using the least square regression line, be able to predict the value of y for any appropriate value of x. • Be able to calculate r2. • Be able to explain the meaning of r2. • Be able to discern what r2 does NOT explain.

  30. Idea of Linear Regression

  31. Linear Regression ŷ = b0 + b1x b0 = ȳ - b1x̄

  32. Example: Regression Line ŷ = 20.11 - 0.526x Age

  33. Example: Regression Line The following data is to determine the relationship between age and change in systolic blood pressure (BP, mm Hg) after 24 hours in response to a particular treatment. x̄ = 52.727, ȳ = -7.636, sx = 14.164, sy = 9.688, r = -0.76951 b) What is the regression line for this data? c) What would the predicted value be for someone who is 51 years old?

  34. Facts about Least Square Regression • Slope: A change of one standard deviation in x corresponds to a change of r standard deviations in y. • Intercept: the value of y when x = 0. • The line passes through the point (x̄,ȳ). • There is an inherent difference between x and y.

  35. r2 • Coefficient of determination. • Fraction of the variation of the values of y that is explained by the least-squares regression of y on x.

  36. Example: Regression Line The following data is to determine the relationship between age and change in systolic blood pressure (BP, mm Hg) after 24 hours in response to a particular treatment. d) What percent of variation of Y is due to the regression line?

  37. Beware of interpretation of r2 • Linearity • Outliers • Good prediction

  38. 2.5: Cautions about Correlation and Regression - Goals • Be able to calculate the residuals. • Be able to use a residual plot to assess the fit of a regression line. • Be able to identify outliers and influential observations by looking at scatterplots and residual plots. • Be able to determine when you can predict a new value. • Be able to identify lurking variables that can influence the relationship between two variables. • Be able to explain the different between association and causation.

  39. Residuals

  40. Example: Regression Line The following data is to determine the relationship between age and change in systolic blood pressure (BP, mm Hg) after 24 hours in response to a particular treatment. e) What is the residual for someone who is 51 years old?

  41. Residual Plots Good Linearity Violation

  42. Residual Plots Good Constant variance violation

  43. Residual Plots – Bp Original Y outlier

  44. Residual Plots – Bp Original X outlier

  45. Cautions about Correlation and Regression: Extrapolation

  46. Cautions about Correlation and Regression: • Both describe linear relationship. • Both are affected by outliers. • Always PLOT the data. • Beware of extrapolation. • Beware of lurking variables • Lurking variables are important in the study, but are not included. • Confounding variables confuse the issue. • Correlation (association) does NOT imply causation!

  47. Lurking Variables 1. For children, there is an extremely strong correlation between shoe size and math scores. 2. There is a very strong correlation between ice cream sales and number of deaths by drowning. 3. There is very strong correlation between number of churches in a town and number of bars in a town.

  48. What is the lurking variable? http://www.forbes.com/sites/erikaandersen/2012/03/23 /true-fact-the-lack-of-pirates-is-causing-global-warming/

  49. 2.7: The Question of Causation - Goals • Be able to explain an association • Causation • Common response • Confounding variables • Apply the criteria for establishing causation.

  50. Causation Association does not mean causation!

More Related