460 likes | 660 Views
2. Chapter 8 Correlation/Linear Regression. Linear Relationships: If the explanatory and response variables show a straight-line pattern, then we say they follow a linear relationship. Curved relationships and clusters are other forms to watch for. . 3. Chapter 8 Correlation/Linear Regression. Di
E N D
1. 1 Chapter 8 Correlation/Linear Regression Linear Relationships: If the explanatory and response variables show a straight-line pattern, then we say they follow a linear relationship.
Curved relationships and clusters are other forms to watch for.
2. 2 Chapter 8 Correlation/Linear Regression Linear Relationships: If the explanatory and response variables show a straight-line pattern, then we say they follow a linear relationship.
Curved relationships and clusters are other forms to watch for.
3. 3 Chapter 8 Correlation/Linear Regression Direction: If the relationship has a clear direction, we speak of either positive association or negative association.
Positive association: high values of the two variables tend to occur together
Negative association: high values of one variable tend to occur with low values of the other variable.
4. 4 Chapter 8 Correlation/Linear Regression Correlation is a number that determines the strength of a linear relationship between two quantitative variables.
Correlation is always between -1 and 1 inclusive
The sign of a correlation coefficient determines positive/negative association between the variables
5. 5 Chapter 8 Correlation/Linear Regression Strong correlation: If r is between 0.8 and 1 and -0.8 and -1
Moderate correlation: If r is between 0.5 and 0.8 and -0.8 and -0.5
Weak correlation: If r is between 0 and 0.5 and -0.5 and 0
6. 6 Chapter 8 Correlation/Linear Regression Correlation does not distinguish between X and Y
Correlation is unitless
Correlation measures the strength of linear relationship between two quantitative variables
7. 7 Chapter 8 Correlation/Linear Regression
8. 8 Choose the best description of the scatter plot Moderate, negative, linear association
Strong, curved, association
Moderate, positive, linear association
Strong, negative, non-linear association
Weak, positive, linear association
9. 9 Which of the following values is most likely to represent the correlation coefficient for the data shown in this scatterplot? r = -0.67
r = -0.10
r = 0.71
r = 0.96
r = 1.00
10. 10 Which of the following values is most likely to represent the correlation coefficient for the data shown in this scatterplot? r = -0.67
r = -0.10
r = 0.71
r = 0.96
r = 1.00
11. 11 Which of the following values is most likely to represent the correlation coefficient for the data shown in this scatterplot? r = -0.67
r = -0.10
r = 0.71
r = 0.96
r = 1.00
12. 12 Cautions about Correlation It should only be used
To describe the relationship between 2 QUANTITATIVE variables
When the association is “linear enough”
When there are no outliers
Correlation does NOT imply causation
13. 13 A teacher at an elementary school measures the
heights of children on the playground and then makes a
scatter plot of the children’s heights and reading test
scores. The data meet the conditions for correlation so
she calculates r = .79. Which conclusion is most
accurate?
Being taller causes students to read better
Being shorter causes students to read better
Taller students tend to have better reading scores
Shorter students tend to have better reading scores
14. 14 Chapter 8 Linear Models Easiest to understand and analyze
Relationships are often linear
Variables with non-linear relationship can often be transformed into linear relationship through an appropriate transformation
Even when a relationship is non-linear, a linear model may provide an accurate approximation for a limited range of values.
Strength: The strength of a linear relationship is determined by how close the points in the scatterplot lie to a straight line
15. Least Square Regression Line - Calculations
16. 16 Chapter 8 Linear Models Not all data fall on a straight line!
Residual = Data – Model or
Residual = Observed Y – Predicted y
17. 17 Chapter 8 Linear Models Example
X= Fat Y= Calories
19 410
31 580
34 590
35 570
39 640
39 680
43 660
18. 18 Chapter 8 Linear Models
19. 19 Chapter 8 Linear Models
20. 20 Chapter 8 Linear Models S = 27.3340 R-Sq = 92.3% R-Sq(adj) = 90.7%
Residual Plot
21. 21 Chapter 9 Regression Wisdom Extrapolation: Reaching beyond the data
Outliers: Regression models are sensitive to outliers
Leverage: An unusual data point whose x value is far from the mean of the x values
A point with high leverage has the potential to change the regression line.
22. 22 Chapter 9 Regression Wisdom Influential: A point is influential if omitting it from the analysis gives a very different model.
Influence depends on leverage and residual
Lurking variables: A variable that is not included in the construction of the linear model/study.
23. 23 Chapter 9 Regression Wisdom Lurking variables may influence correlation and regression models.
Association is not causations!!
24. 24 Summary r is a number between -1 and 1
r = 1 or r = -1 indicates a perfect correlation case where all data points lie on a straight line
r > 0 indicates positive association
r < 0 indicates negative association
r value does not change when units of measurement are changed (correlation has no units!)
Correlation treats X and Y symmetrically. The correlation of X with Y is the same as the correlation of Y with X
25. 25 Summary
Quantitative variable condition: Do not apply correlation to categorical variables
Correlation can be misleading if the relationship is not linear
Outliers distort correlation dramatically. Report correlation with/without outliers.
26. 26 More Examples for Checking Linear Enough ConditionAll four data sets have r = .82
27. 27 In which case is a linear model appropriate?
28. 28
29. 29
30. 30 Calculating r with the TI-83/84 The first time you do this:
Press 2nd, CATALOG (above 0)
Scroll down to DiagnosticOn
Press ENTER, ENTER
Read “Done”
Your calculator will remember this setting even when turned off
31. 31 Calculating r with the TI-83/84 Press STAT, ENTER
If there are old values in L1:
Highlight L1, press CLEAR, then ENTER
If there are old values in L2:
Highlight L2, press CLEAR, then ENTER
Enter predictor (x) values in L1
Enter response (y) values in L2
Pairs must line up
There must be the same number of predictor and response values
32. 32 Calculating r with the TI-83/84 Press STAT, > (to CALC)
Scroll down to LinReg(ax+b), press ENTER, ENTER
Read r at bottom of screen
33. 33 Re-Expression with the TI-83/84 Most common re-expressions are built in.
To see what’s available, try
STAT
CALC
Scroll down to see
5:QuadReg
6:CubicReg
9:LnReg
0:ExpReg
A:PwrReg
34. 34 Example
X: Age in months
Y: Height in inches
X: 18 19 20 21 22 23 24
Y: 29.9 30.3 30.7 31 31.38 31.45 31.9
35. 35 Chapter 9 Prediction, Residuals, Influence Linear Model: Height = 24.212 +.321 * Age
Correlation: r = .992
Examples
Age = 24 months, Observed Height = 31.9
Predicted Height = 31.916
Residual = 31.9 – 31.916 = .016
36. 36 Chapter 9 Prediction, Residuals, Influence Age = 20 years (20*12 = 240)
Predicted Height ~ 8.5 ft!!
Residual = BIG!
Be aware of Extrapolation!
37. 37 Example 4. Relationship between calories and sugar content: A researcher tracked the sugar content and calorie of 15 baked goods and found the following information:
Average sugar content: 7.0 grams
Standard deviation of sugar content: 4.4 grams
Average calories: 107.0 grams
Standard deviation of calories: 19.5 grams
Correlation between sugar content and calories:
0.564
38. 38 Solution to Example
a) Find a linear model that describes this example:
b_{1}=r S_{y}/S_{x} = 0.564*19.5/4.4 = 2.5 calories per gram of sugar
b_{0}= mean of (Y) –b{1}mean of (X) = 107 -2.50*7 = 89.5
Linear Model: y = b_{0}+b_{1}x
y= 89.5 + 2.5x or better
calories = 89.5 +2.50* sugar
b) How many calories are there in a muffin with 6.5 grams of sugar?
calories = 89.5 +2.50* 6.5 = 105.75
39. 39 Chapter 10 Re-expressing Data Example: The data shows the number of academic journals published on the Internet and during the last decade.
40. 40 Chapter 10 Re-expressing Data
41. 41 Chapter 10 Re-expressing Data Re-express data to linearize:
42. 42 Chapter 10 Re-expressing Data
43. 43
44. 44 Chapter 10 Re-expressing Data Least Square Regression Line has the following equation:
Log(journals) = 1.22 + 0.346 * Year
Problem:
How many journals will be published online in year 2000?
45. 45 Chapter 10 Re-expressing Data
Answer
Log(journals) = 1.22+ 0.346*9 =4.334
Answer: 21577.44 (10^(4.334))
46. 46 Chapter 10 Re-expressing Data Why Re-expressing data?
Make a distribution of a variable more symmetric
Make the spread of several groups more alike, even if their centers differ
Make the form of a scatterplot more nearly linear
Make the scatter in a scatterplot spreadout more evenly rather than thickening at one end.
47. 47 Chapter 10 Re-expressing Data The Ladder of Powers:
Power 2: the square of the data values y^2
Try this for unimodal distributions that are skewed to the left.
Power 1: No change at all
Power ½: the square root of the data values
Y^(1/2)
Try this for counted data
Power 0: the logarithm of the data values y
Try this for measurements that cannot be negative
Especially those that grow by percentage increases
Salries and populations are good examples.