1 / 19

Chapter 4

Chapter 4. Regression. Regression. Like correlation, regression addresses linear relationships between quantitative variables X & Y Objective of correlation  quantify direction and strength of linear association

Download Presentation

Chapter 4

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 4 Regression

  2. Regression • Like correlation, regression addresses linear relationships between quantitative variables X & Y • Objective of correlation  quantify direction and strength of linear association • Objective of regression derive best fitting line that describes the association • We are especially interested in the slope of the line

  3. Same illustrative data as Ch 3 Enter data into calculator

  4. Algebraic equation for a line • y = a + b∙Xwhere • b ≡ slope ≡ change in Y per unit X • a ≡ intercept ≡ value of Y when x = 0

  5. Statistical Equation for a Line ŷ = a + b∙X where: ŷ ≡ predicted average of Y at a given level of X a ≡ intercept b≡ slope a and b are called regression coefficients

  6. How do we find the equation for the best fitting line through the scatter cloud? Ans: We use the “least squares method”

  7. These formulas derive the coefficients for the least squares regression line

  8. Illustrative Example (GDP & Life Expectancy) Statistics for illustrative data (calculated with TI-30XSII) Calculation of regression coefficients by hand:

  9. “Least Squares” Regression Coefficients via TI-30XIIS STAT > 2-VAR > DATA > STATVAR BEWARE! The TI-30XIIS mislabels the slope & intercept. The slope is mislabeled as a and the intercept is mislabeled as b. It should be the other way around!

  10. Interpretation of Slope(GDP & Life Expectancy) ŷ= 68.7 + 0.42∙X  Each ↑$1K in GDP associated with a 0.42 year increase in life expectancy b = increase in Y per unit X = 0.42 years 1 unit X

  11. Interpretation of Intercept • Mathematically = the predicted value of Y when X = 0 • In real-world = has no interpretation unless a value of X = 0 is plausible

  12. Regression Line for Prediction • Example: What is the predicted life expectancy of a country with a GDP of 20? • Ŷx=20= 68.7 + (0.42)X = 68.7+(0.42)(20) = 77.12 • The regression line will always go through (x-bar, y-bar) which in this case is (21.5, 77.8) • To draw the regression line, connect any two points on the line x x

  13. Coefficient of Determinationr2 Interpretation: proportion of the variability in Ymathematically explained by X Our example  r =.809 r2 = .8092 = 0.66. Interpretation: 66% of the variability in Y (life expectancy) mathematically explained* by X (GDP) * mathematically explained ≠ causally explained

  14. Cautions about linear regression • Applies to linear relationships only • Strongly influenced by outliers, especially when outlier is in the X direction • Do not extrapolate! • Association ≠ causation (Beware of lurking variables.)

  15. Outliers / Influential Points • Outliers in the X direction have strong influence (tip the line) • Example (right) • Child 18 = outlier in X direction • Changes the slope substantially w/o outlier with outlier

  16. Do Not Extrapolate! • Example (right): Sarah’s height from age 3 to 5 • Least squares regression line: ŷ= 2.32 + .159(X) • Predict height at age 30 • ŷ= 2.32 + .159(X) = 2.32 + .159(30) = 8.68’(ridiculous) •  Do NOT extrapolate beyond the range of X

  17. “Association”not the same as “causation” Lurking variable ≡ an extraneous factor (Z) that is associated with both X and Y Lurking variables can confound an association Association ≠ Causation

  18. Explanatory variable X≡ number of prior children Response variable Y ≡ the risk of Down’s syndrome Lurking variable Z≡ advanced age of mother X is associated with Y, but does not cause Y in this example Z does cause Y Example of Confounding by a Lurking Variable Number of children Mental retardation Older mother

  19. Criteria used to establish causality with examples about smoking (X) and lung cancer (Y) • Strength of association • X & Y strongly correlated • Consistency of findings • Many studies have shown X & Y correlated • Dose-response relationship • The more you smoke, the more you increase risk • Temporality (time relation) • Lung cancer occurs after 10 – 20 years of smoking • Biological plausibility • Chemical in cigarette smoke are mutagenic

More Related