650 likes | 903 Views
Learning Objectives. Identify variable type: Response or ExplanatoryDefine AssociationContingency tablesCalculate proportions and conditional proportions. Learning Objective 1: Response and Explanatory variables. Response variable (Dependent Variable) the outcome variable on which comparisons ar
E N D
1. Chapter 3Association: Contingency, Correlation, and Regression Section 3.1
How Can We Explore the Association between Two Categorical Variables?
2. Learning Objectives Identify variable type: Response or Explanatory
Define Association
Contingency tables
Calculate proportions and conditional proportions
3. Learning Objective 1:Response and Explanatory variables Response variable (Dependent Variable)
the outcome variable on which comparisons are made
Explanatory variable (Independent variable)
defines the groups to be compared with respect to values on the response variable
Example: Response/Explanatory
Blood alcohol level/# of beers consumed
Grade on test/Amount of study time
Yield of corn per bushel/Amount of rainfall
4. Learning Objective 2:Association The main purpose of data analysis with two variables is to investigate whether there is an association and to describe that association
An association exists between two variables if a particular value for one variable is more likely to occur with certain values of the other variable
5. Learning Objective 3:Contingency Table A contingency table:
Displays two categorical variables
The rows list the categories of one variable
The columns list the categories of the other variable
Entries in the table are frequencies
6. Learning Objective 3:Contingency Table
7. Learning Objective 4:Calculate proportions and conditional proportions
8. Learning Objective 4: Calculate proportions and conditional proportions
What proportion of organic foods contain pesticides?
What proportion of conventionally grown foods contain pesticides?
What proportion of all sampled items contain pesticide residuals?
9. Learning Objective 4:Calculate proportions and conditional proportions
10. Learning Objective 4:Calculate proportions and conditional proportions If there was no association between organic and conventional foods, then the proportions for the response variable categories would be the same for each food type
11. Chapter 3Association: Contingency, Correlation, and Regression Section 3.2
How Can We Explore the Association between Two Quantitative Variables?
12. Learning Objectives: Constructing scatterplots
Interpreting a scatterplot
Correlation
Calculating correlation
13. Learning Objective 1:Scatterplot Graphical display of relationship between two quantitative variables:
Horizontal Axis: Explanatory variable, x
Vertical Axis: Response variable, y
14. Learning Objective 1:Internet Usage and Gross National Product (GDP) Data Set
15. Learning Objective 1:Internet Usage and Gross National Product (GDP)
16. Learning Objective 1:Baseball Average and Team Scoring
17. Learning Objective 1:Baseball Average and Team Scoring
18. Learning Objective 2:Interpreting Scatterplots You can describe the overall pattern of a scatterplot by the trend, direction, and strength of the relationship between the two variables
Trend: linear, curved, clusters, no pattern
Direction: positive, negative, no direction
Strength: how closely the points fit the trend
Also look for outliers from the overall trend
19. Learning Objective 2:Interpreting Scatterplots: Direction/Association Two quantitative variables x and y are
Positively associated when
High values of x tend to occur with high values of y
Low values of x tend to occur with low values of y
Negatively associated when high values of one variable tend to pair with low values of the other variable
20. Learning Objective 2:Example: 100 cars on the lot of a used-car dealership Would you expect a positive association, a
negative association or no association between
the age of the car and the mileage on the
odometer?
Positive association
Negative association
No association
21. Learning Objective 2:Example: Did the Butterfly Ballot Cost Al Gore the 2000 Presidential Election?
22. Learning Objective 3:Linear Correlation, r Measures the strength and direction of the linear association between x and y
A positive r value indicates a positive association
A negative r value indicates a negative association
An r value close to +1 or -1 indicates a strong linear association
An r value close to 0 indicates a weak association
23. Learning Objective 3:Correlation coefficient: Measuring Strength & Direction of a Linear Relationship
24. Learning Objective 3:Properties of Correlation Always falls between -1 and +1
Sign of correlation denotes direction
(-) indicates negative linear association
(+) indicates positive linear association
Correlation has a unitless measure - does not depend on the variables’ units
Two variables have the same correlation no matter which is treated as the response variable
Correlation is not resistant to outliers
Correlation only measures strength of linear relationship
25. Leaning Objective 4:Calculating the Correlation Coefficient
26. Learning Objective 4:Calculating the Correlation Coefficient
27. Learning Objective 4:Internet Usage and Gross National Product (GDP) STAT CALC menu
Choose 8: LinReg(a+bx)
1st number = x variable
2nd number = y variable
Enter
28. Learning Objective 4:Baseball Average and Team Scoring Enter x data into L1
Enter y data into L2
STAT CALC memu
Choose 8: LinReg(a+bx)
1st number = x variable
2nd number = y variable
Enter
29. Learning Objective 4:Cereal: Sodium and Sugar
30. Chapter 3Association: Contingency, Correlation, and Regression Section 3.3
How Can We Predict the Outcome of a Variable?
31. Learning Objectives Definition of a regression line
Use a regression equation for prediction
Interpret the slope and y-intercept of a regression line
Identify the least-squares regression line as the one that minimizes the sum of squared residuals
Calculate the least-squares regression line
32. Learning Objectives Compare roles of explanatory and response variables in correlation and regression
Calculate r2 and interpret
33. Learning Objective 1:Regression Analysis The first step of a regression analysis is to identify the response and explanatory variables
We use y to denote the response variable
We use x to denote the explanatory variable
34. Learning Objective 1:Regression Line A regression line is a straight line that describes how the response variable (y) changes as the explanatory variable (x) changes
A regression line predicts the value of the response variable (y) for a given level of the explanatory variable (x)
The y-intercept of the regression line is denoted by a
The slope of the regression line is denoted by b
35. Learning Objective 2:Example: How Can Anthropologists Predict Height Using Human Remains? Regression Equation:
is the predicted height and is the length of a femur (thighbone), measured in centimeters
36. Learning Objective 3:Interpreting the y-Intercept y-Intercept:
The predicted value for y when x = 0
Helps in plotting the line
May not have any interpretative value if no observations had x values near 0
37. Learning Objective 3:Interpreting the Slope Slope: measures the change in the predicted variable (y) for a 1 unit increase in the explanatory variable in (x)
Example: A 1 cm increase in femur length results in a 2.4 cm increase in predicted height
38. Learning Objective 3:Slope Values: Positive, Negative, Equal to 0
39. Learning Objective 3:Regression Line At a given value of x, the equation:
Predicts a single value of the response variable
But… we should not expect all subjects at that value of x to have the same value of y
Variability occurs in the y values!
40. Learning Objective 3:The Regression Line The regression line connects the estimated means of y at the various x values
In summary,
Describes the relationship between x and the estimated means of y at the various values of x
41. Learning Objective 4:Residuals Measures the size of the prediction errors, the vertical distance between the point and the regression line
Each observation has a residual
Calculation for each residual:
A large residual indicates an unusual observation
42. Learning Objective 4:“Least Squares Method” Yields the Regression Line Residual sum of squares:
The least squares regression line is the line that minimizes the vertical distance between the points and their predictions, i.e., it minimizes the residual sum of squares
Note: the sum of the residuals about the regression line will always be zero
43. Learning Objective 5:Regression Formulas for y-Intercept and Slope
Slope:
Y-Intercept:
44. Learning Objective 5:Calculating the slope and y intercept for the regression line
45. Learning Objective 5:Internet Usage and Gross National Product (GDP)
46. Learning Objective 5:Internet Usage and Gross National Product Enter x data into L1
Enter y data into L2
STAT CALC menu
Choose 8: LinReg(a+bx)
1st number = x variable
2nd number = y variable
Enter
47. Learning Objective 5:Baseball Average and Team Scoring
48. Learning Objective 5:Baseball average and Team Scoring Enter x data into L1
Enter y data into L2
STAT CALC
Choose 8: LinReg(a+bx)
1st number = x variable
2nd number = y variable
Enter
49. Learning Objective 5:Cereal: Sodium and Sugar
50. Learning Objective 6:The Slope and the Correlation Correlation:
Describes the strength of the linear association between 2 variables
Does not change when the units of measurement change
Does not depend upon which variable is the response and which is the explanatory
51. Learning Objective 6:The Slope and the Correlation Slope:
Numerical value depends on the units used to measure the variables
Does not tell us whether the association is strong or weak
The two variables must be identified as response and explanatory variables
The regression equation can be used to predict values of the response variable for given values of the explanatory variable
52. Learning Objective 7:The Squared Correlation When a strong linear association exists, the regression equation predictions tend to be much better than the predictions using only
We measure the proportional reduction in error and call it, r2
53. Learning Objective 7:The Squared Correlation measures the proportion of the variation in the y-values that is accounted for by the linear relationship of y with x
A correlation of .9 means that
81% of the variation in the y-values can be explained by the explanatory variable, x
54. Chapter 3Association: Contingency, Correlation, and Regression Section 3.4
What Are Some Cautions in Analyzing Association?
55. Learning Objectives: Extrapolation
Outliers and Influential Observations
Correlations does not imply causation
Lurking variables and confounding
Simpson’s Paradox
56. Learning Objective 1:Extrapolation Extrapolation: Using a regression line to predict y-values for x-values outside the observed range of the data
Riskier the farther we move from the range of the given x-values
There is no guarantee that the relationship given by the regression equation holds outside the range of sampled x-values
57. Learning Objective 2:Outliers and Influential Points
Construct a scatterplot
Search for data points that are well outside of the trend that the remainder of the data points follow
58. Learning Objective 2:Outliers and Influential Points A regression outlier is an observation that lies far away from the trend that the rest of the data follows
An observation is influential if
Its x value is relatively low or high compared to the remainder of the data
The observation is a regression outlier
Influential observations tend to pull the regression line toward that data point and away from the rest of the data
59. Learning Objective 2:Outliers and Influential Points Impact of removing an Influential data point
60. Learning Objective 3:Correlation does not Imply Causation A strong correlation between x and y means that there is a strong linear association that exists between the two variables
A strong correlation between x and y, does not mean that x causes y
61. Data are available for all fires in Chicago last year on x = number of firefighters at the fires and y = cost of damages due to fire Would you expect the correlation to be negative, zero, or positive?
If the correlation is positive, does this mean that having more firefighters at a fire causes the damages to be worse? Yes or No
Identify a third variable that could be considered a common cause of x and y:
Distance from the fire station
Intensity of the fire
Size of the fire
62. Learning Objective 4:Lurking Variables & Confounding A lurking variable is a variable, usually unobserved, that influences the association between the variables of primary interest
Ice cream sales and drowning – lurking variable =temperature
Reading level and shoe size – lurking variable=age
Childhood obesity rate and GDP-lurking variable=time
When two explanatory variables are both associated with a response variable but are also associated with each other, there is said to be confounding
Lurking variables are not measured in the study but have the potential for confounding
63. Learning Objective 5:Simpson’s Paradox Simpson’s Paradox:
When the direction of an association between two variables changes after we include a third variable and analyze the data at separate levels of that variable
64. Learning Objective 5:Simpson’s Paradox Example
65. Learning Objective 5:Simpson’s Paradox Example
66. Learning Objective 5:Simpson’s Paradox Example