560 likes | 772 Views
5-2. Chapter 5. Exploring Bivariate Data. . 5-3. Outline. Do I Need to Read This Chapter? 5-1 Scatter Plots5-2 Looking for Patterns in the Data5-3 Linear Correlation5-4 Correlation and Causation. 5-4. Outline. 5-5 Least-Squares Regression Line5-6 The Coefficient of
E N D
1. 5-1 SLIDES PREPARED
By
Lloyd R. Jaisingh Ph.D.
Morehead State University
Morehead KY
2. 5-2 Chapter 5 Exploring Bivariate Data
3. 5-3 Outline Do I Need to Read This Chapter?
5-1 Scatter Plots
5-2 Looking for Patterns in the Data
5-3 Linear Correlation
5-4 Correlation and Causation
4. 5-4 Outline 5-5 Least-Squares Regression
Line
5-6 The Coefficient of Determination
5-7 Residual Plots
5-8 Outliers and Influential
Points
5. 5-5 Objectives Introduction of some basic statistical terms that are related to correlation and regression analysis.
Basic introduction to the concepts of linear correlation and linear regression analysis.
6. 5-6 5-1 Scatter Plots In simple correlation and regression studies, data are collected on two quantitative variables (bivariate data) to determine whether a relationship exists between the two variables.
To illustrate this graphically, consider the following example.
7. 5-7 5-1 Scatter Plots Example: The bivariate data given in the following table relate the high temperature (0F) reached on a given day and the number of cans of soft drinks sold from a particular vending machine in front of a grocery store. Data were collected for 15 different days.
8. 5-8 5-1 Scatter Plots
9. 5-9 5-1 Scatter Plots To analyze graphically, we can display the data on a two-dimensional graph.
We can plot the number of cans of soft drinks along the vertical axis and the temperature along the horizontal axis.
Such plots are called scatter plots.
10. 5-10 5-1 Scatter Plots
11. 5-11 5-1 Scatter Plots The variable plotted along the vertical axis is called the dependent variable.
The variable plotted along the horizontal axis is called the independent variable.
Notation: We will let y represent the dependent variable and we will let x represent the independent variable.
12. 5-12 5-1 Scatter Plots Explanation of the term – scatter plot: A scatter plot is a graph of the ordered pairs (x, y) of values for the independent variable x and the dependent variable y.
13. 5-13 5-1 Scatter Plots NOTE: The number of cans of soft drinks sold will depend on the temperature.
Thus, the dependent variable (y) will be the number of cans of soft drinks sold, and the independent variable (x) will be the temperature.
14. 5-14 5-2 Looking at Patterns Detecting an association or a relationship for bivariate data starts with a scatter plot.
When examining a scatter plot, one should try to answer the following questions:
Is there a straight-line pattern or association?
15. 5-15 5-2 Looking at Patterns Does the pattern or association slope upward or downward?
Are the plotted values tightly clustered together in the pattern or widely separated?
Are there noticeable deviations from the pattern?
16. 5-16 Quick Tips: Two variables are said to be positively related if larger values of one variable tend to be associated with larger values of the other.
Two variables are said to be negatively related if larger values of one variable tend to be associated with smaller values of the other.
17. 5-17 Perfect Positive Linear Association
18. 5-18 Perfect Negative Linear Association
19. 5-19 Very Strong Positive Linear Association
20. 5-20 Very Strong Negative Linear Association
21. 5-21 Little or No Association
22. 5-22 Nonlinear Association
23. 5-23 5-3 Correlation So far you have seen how a scatter plot can provide a visual of the association between two variables.
Here we will discuss a numerical measure of the linear association between two variables called the Pearson product moment correlation coefficient or simply the correlation coefficient.
24. 5-24 5-3 Correlation Explanation of the term – sample correlation coefficient: The sample correlation coefficient measures the strength and direction of the linear relationship between two variables using sample data.
The sample correlation coefficient is denoted by the letter r and is computed from the equation on the next slide.
25. 5-25 5-3 Correlation
n is the number of (x,y) data pairs.
26. 5-26 5-3 Correlation Example: Compute the linear correlation coefficient for the following set of observations for the independent variable x and the dependent variable y.
27. 5-27 5-3 Correlation Solution: The formula may look intimidating, but we can construct a table to help with the computations.
28. 5-28 5-3 Correlation Solution: Using the values from the previous table, we have
29. 5-29 5-3 Correlation Note: We may use available technology to help compute the correlation coefficient. The following is a MINITAB output with the value.
30. 5-30 5-3 Correlation The scatter plot displays the negative correlation between x and y.
31. 5-31 Properties of the Correlation Coefficient The range of the correlation coefficient is from –1 to +1.
If there is a perfect positive linear relationship between the variables, the value of r will be equal to +1.
If there is a perfect negative linear relationship between the variables, the value of r will be equal to –1.
32. 5-32 Properties of the Correlation Coefficient If there is a strong positive linear relationship between the variables, the value of r will be close to +1
If there is a strong negative linear relationship between the variables, the value of r will be close to –1
If there is little or no linear relationship between the variables, the value of r will be close to 0.
33. 5-33 Quick Tip: One should always examine the scatter plot and not just rely on the value of the linear correlation.
This measure will not detect curvilinear or other types of complex relationships.
That is, there may be a non-linear relationship between two variables even though the linear correlation is close to 0. See the next slide.
34. 5-34 Quick Tip:
35. 5-35 Correlation and Causation
36. 5-36 Correlation and Causation
37. 5-37 Correlation and Causation
38. 5-38 Correlation and Causation
39. 5-39 5-5 Least-Squares Regression Line
40. 5-40 5-5 Least-Squares Regression Line
41. 5-41 5-5 Least-Squares Regression Line
42. 5-42 5-5 Least-Squares Regression Line
43. 5-43 5-5 Least-Squares Regression Line
44. 5-44 5-5 Least-Squares Regression Line
45. 5-45 5-5 Least-Squares Regression Line
46. 5-46 5-5 Least-Squares Regression Line
47. 5-47 5-5 Least-Squares Regression Line
48. 5-48 5-5 Least-Squares Regression Line
49. 5-49 5-5 Least-Squares Regression Line
50. 5-50 Quick Tip: When using the line of best fit to make predictions, care must be taken to use independent values that are within the range of the observed independent variable.
Using values outside of the range of observed independent values may lead to incorrect predictions because we do not know how the model is behaving outside this range.
51. 5-51 Quick Tip: The model reflects the behavior of the association between the two variables only within the range of the observed values.
52. 5-52 5-5 Least-Squares Regression Line
53. 5-53 5-6 The Coefficient of Determination
54. 5-54 5-6 The Coefficient of Determination
55. 5-55 5-6 The Coefficient of Determination
56. 5-56 Display of the Least-Squares Regression Line Superimposed on the Scatter Plot