200 likes | 319 Views
Chapter 5. Regression . Regression. Like correlation, regression addresses the relationship between a quantitative explanatory variable (X) and quantitative response variable (Y) The objective of regression is to describe the best fitting line through the data
E N D
Chapter 5 Regression Chapter 5
Regression • Like correlation, regression addresses the relationship between a quantitative explanatory variable (X) and quantitative response variable (Y) • The objective of regression is to describe the best fitting line through the data • As with correlation, start by looking at the data with a scatterplot Chapter 5
Same data as last week Chapter 5
Inspect scatterplot for linearity Chapter 5
The Regression Line The regression line predicts values of Y with this equation (the “regression model”): ŷ = a + b∙X where: ŷ ≡ predicted value of Y at given X a ≡ intercept b = slope a and b are called regression coefficients Chapter 5
Calculation of slope & intercept Chapter 5
Example: calculation of regression coefficients Last week we calculated: Therefore: ŷ= a + b∙X = 68.716 + 0.420∙X Chapter 5
Regression Coefficients by Calculator This course supports the TI-30IIS. Other calculators are acceptable but are not supported by the instructor. BEWARE! The TI-30XIIS mislabels the slope & intercept. The slope is mislabeled as a and the intercept is mislabeled as b. It should be the other way around! Chapter 5
Interpretation of Slope b • The slope predicts the increase in Y per unit X. • Example: ŷ= 68.7 + 0.42∙X • The slope = 0.42 Each unit increase in X (GDP) is associated with a 0.420 increase in Y (life expectancy) Chapter 5
Interpretation: Intercept a • The intercept is where the line would pass through the Y-axis (when X = 0). • Example: ŷ= 68.7 + 0.42∙X • The intercept = 68.7. • We do NOT normally interpolate the intercept Chapter 5
Regression Line for Prediction • Use regression equation to predict Y given X • Example ŷ= 68.7 + (0.420)X • What is the predicted life expectancy in a country with a GDP of 20.0? ŷ= a + bX = 68.7+(0.420)(20.0) = 77.12 Chapter 5
Coefficient of Determination Denoted r2 (the square r) Interpretation: fraction of the Y “explained” by X Illustration: Our example showed r =.809. Therefore, r2 = .8092 = 0.66. Interpretation: 66% of the variation in Y (life expectancy) is mathematically “explained” by X (GDP) Chapter 5
Cautions about regression • Linear relationships only (see prior lecture) • Influenced by outliers • Cannot be extrapolated • Association is not equal to causation! (Beware of lurking variables.) Chapter 5
Outliers and Influential Points • An outlier is an observation that lies far from the regression line • Outliers in the Y direction have large residuals • Outliers in the X direction are influential Chapter 5
After removing child 18 Line for all data Example: Influential OutlierGesell Adaptive Score and “First Word” Chapter 5
Extrapolation • Extrapolation is the use of the regression equation for predictions outside the range of explanatory variable X • Do NOT extrapolate! • See next slide Chapter 5
Example: extrapolation (Sarah’s height) • Figure: Sarah’s height from age 36 to 60 months (3 to 5 years) • Regression model:ŷ= 72 + .4(X) • To predict Sarah’s height at 42 months:ŷ = 72 + .4(42) = 88.8 cm ≈ 35” (~ 3’) Chapter 5
Example: Extrapolation • Do NOT use the regression model to predict Sarah’s height at age 360 months (30 years)! • ŷ= 72 + .4(X) = 72 + .4(360) = 216 cm = more than 7’ tall(clearly ridiculous) Chapter 5
Association does not imply causation Even strong correlations may be non-causal See pp. 144 – 145 for examples! Chapter 5
Association does not imply causation Criteria to establish causation (pp. 144 – 146): • Strength of relationship • Experimentation • Consistency • Dose-response • Temporality • Plausibility Chapter 5