150 likes | 165 Views
This chapter explores the use of scatterplots to display and interpret relationships between two quantitative variables. It also introduces the concept of correlation to measure the strength and direction of linear association. The chapter highlights the importance of considering other variables that may influence the relationship between two variables.
E N D
Chapter 4 • Scatterplots and Correlation
Chapter outline • Explanatory and response variables • Displaying relationships: Scatterplots • Interpreting scatterplots • Adding categorical variables to scatterplots • Measuring linear association: correlation r • Facts about correlation
Explanatory and Response Variables • Response variable measures an outcome of a study. • An explanatory variable explains, influences or cause changes in a response variable. • Independent variable and dependent variable. • Be careful!! The relationship between two variables can be strongly influenced by other variables that are lurking in the background.
Explanatory and response variables • Note: There is not necessary to have a cause-and-effect relationship between explanatory and response variables. • Example 4.1(P. 80) • Example. Cigarette smoking and lung cancer • Example. Sales of personal computers and athletic shoes
Displaying relationships: Scatterplots • A scatterplot displays the relationship between two quantitative variables measured on the same individuals. • It is the most common way to display the relation between two quantitative variables. • It displays the form, direction, and strength of the relationship between two quantitative variables. • The values of one variable appear on the horizontal axis, and the values of the other variable appear on the vertical axis. Each individual in the data appears as the point in the plot fixed by the values of both variables for that individual.
Interpreting scatterplots • How to examine a scatterplot: • An overall pattern showing: • The form, direction, and strength of the relationship • Outliers or other deviations from this pattern.
Interpreting scatterplots • Overall Pattern • Form: Linear relationships, where the points show a straight-line pattern, are an important form of relationship between two variables. Curved relationships and clusters (a number of similar individuals that occur together) are other forms to watch for. • Directions: If the relationship has a clear direction, we speak of either positive association (the more the x, the more the y) or negative association (the more the x, the less the y). • Strength: The strength of a relationship is determined by how close the points in the scatterplot lie to a line.
Scatterplot & Correlation • Scatterplots provide a visual tool for looking at the relationship between two variables. Unfortunately, our eyes are not good tools for judging the strength of the relationship. Changes in the scale or the amount of white space in the graph can easily change our judgment of the strength of the relationship. • Correlation is a numerical measure we use to show the strength of linear association.
Measuring linear association: correlation r(The Pearson Product-Moment Correlation Coefficient or Correlation Coefficient) • The correlation r measures the strength and direction of the linear association between two quantitative variables, usually labeled X and Y.
Facts about correlation • What kind of variables do we use? • 1. No distinction between explanatory and response variables. • 2. Both variables should be quantitative • Numerical properties • 1. • 2. r>0: positive association between variables • 3. r<0: negative association between variables • 4. If r =1or r = - 1, it indicates perfect linear relationship • 5. As |r| is getting close to 1, much stronger relationship • 6. Effected by a few outliers not resistant. • 7. It doesn’t describe curved relationships • 8. Not easy to guess the value of r from the appearance of a scatter plot