180 likes | 196 Views
Learn how to analyze scatterplots, identify linear relationships, and interpret correlation coefficients to understand data patterns effectively. Explore examples and challenges with correlations in various contexts.
E N D
Chapter 4 (cont.) Scatterplots and Correlation
Explanatory and Response Variables Last time we were interested in studying the relationship between two variables by measuring both variables on the same individuals. • a response variable measures an outcome of a study • an explanatory variable explains or influences changes in a response variable • sometimes there is no distinction
Explanatory and Response Variables • What were our explanatory and response variables? • If a distinction exists, plot the explanatory variable on the horizontal (x) axis and plot the response variable on the vertical (y) axis when making a scatterplot.
Analyzing a Scatterplot • Look for an overall pattern and deviations from this pattern • Describe pattern by form, direction, and strength of the relationship • Look for outliers
Form Some relationships are such that the points of a scatterplot tend to fall along a straight line -- linear relationship The pattern may also indicated a curved relationship, some other relationship, or that there is no relationship at all. We will be exploring the cases where there is a linear relationship in more detail in Chapter 5.
Direction • Positive association • above-average values of one variable tend to accompany above-average values of the other variable, and below-average values tend to occur together • Negative association • above-average values of one variable tend to accompany below-average values of the other variable, and vice versa
Direction • Which ingredients are positively associated with ratings? • Negatively associated with ratings? • Does this positive or negative association make sense given your own understanding about the nutritional value of the ingredient?
Direction • Analyze the vitamin-ratings relationship. Why is this relationship surprising? • What could explain the negative relationship? (Perhaps Consumer Reports is not using vitamins in their formula.) Why? (There is very little variability in the vitamin content among the cereals, so it is not a useful way to distinguish cereals.)
Variability • We attempted to evaluate the strength of the relationship between the ingredient and the rating • Some ingredients are more strongly related to the rating, like sugar. It was easier to make a confident prediction on the sugar scatterplot. • For other ingredients, like protein, the relationship with rating is not as strong. We had to use a much bigger range of ratings when estimating on the protein scatterplot.
Variability • Now we will investigate a measurement of variability in scatterplots called the correlation coefficient, which is denoted with the letter r. • r is roughly an average of the product of z-scores of the x- and y-values. You will not need to find r by hand. • In MINITAB: • To find r: Stat>Basic Statistics>Correlation • To make a scatterplot: Graph>Scatterplot>(Simple; With Groups for 4.31 and 4.43 on your HW) • Double click on the two columns that contain the variables of interest, then click OK.
Investigating the Properties of r • Answer Questions 1-4 on your Class Work Handout • What do you think the correlation coefficient (r) measures? • measures the strength of the relationship: the stronger the relationship, the larger the magnitude of r. • measures the direction of the relationship: positive r indicates a positive association,negative r indicates a negative association. • Is there a largest possible value for r, or can it have larger and larger values without limit? Is there a smallest possible value? • r can only be as small as -1 and as large as 1:
Correlation Coefficient • special values for r : • a perfect positive linear relationship would have r = +1 • a perfect negative linear relationship would have r = -1 • both variables must be quantitative; no distinction between response and explanatory variables is necessary to find r • r has no units; does not change when measurement units are changed (ex: ft. or in.)
Linear Correlation with Nonlinear Scatterplots • Based on your current experience with r, • If r is close to zero, can there be a strong relationship between the two variables? • If r is close to one, can the relationship between the variables be nonlinear? • Answer questions 6 and 7
Examples of Correlations • Husband’s versus Wife’s ages • r = .94 • Husband’s versus Wife’s heights • r = .36 • Professional Golfer’s Putting Success: Distance of putt in feet versus percent success • r = -.94
Problems with Correlations • Outliers can inflate or deflate correlations (see next slide) • Groups combined inappropriately may mask relationships (a third variable) • groups may have different relationships when separated
Outliers and Correlation A B For each scatterplot above, how does the outlier affect the correlation? A: outlier decreases the correlation B: outlier increases the correlation
A quick HW note: You do not need to find r by hand in 4.35 (or ever). Use MINITAB in your homework; on your exam you will be making your best estimation just by looking at a scatterplot. I will accept a range of answers as long as your justification makes sense.