190 likes | 325 Views
Chapter 4. Regression. Regression. Like correlation, regression addresses linear relationships between quantitative variables X & Y Objective of correlation quantify direction and strength of linear association
E N D
Chapter 4 Regression
Regression • Like correlation, regression addresses linear relationships between quantitative variables X & Y • Objective of correlation quantify direction and strength of linear association • Objective of regression derive best fitting line that describes the association • We are especially interested in the slope of the line
Same illustrative data as Ch 3 Enter data into calculator
Algebraic equation for a line • y = a + b∙Xwhere • b ≡ slope ≡ change in Y per unit X • a ≡ intercept ≡ value of Y when x = 0
Statistical Equation for a Line ŷ = a + b∙X where: ŷ ≡ predicted average of Y at a given level of X a ≡ intercept b≡ slope a and b are called regression coefficients
How do we find the equation for the best fitting line through the scatter cloud? Ans: We use the “least squares method”
These formulas derive the coefficients for the least squares regression line
Illustrative Example (GDP & Life Expectancy) Statistics for illustrative data (calculated with TI-30XSII) Calculation of regression coefficients by hand:
“Least Squares” Regression Coefficients via TI-30XIIS STAT > 2-VAR > DATA > STATVAR BEWARE! The TI-30XIIS mislabels the slope & intercept. The slope is mislabeled as a and the intercept is mislabeled as b. It should be the other way around!
Interpretation of Slope(GDP & Life Expectancy) ŷ= 68.7 + 0.42∙X Each ↑$1K in GDP associated with a 0.42 year increase in life expectancy b = increase in Y per unit X = 0.42 years 1 unit X
Interpretation of Intercept • Mathematically = the predicted value of Y when X = 0 • In real-world = has no interpretation unless a value of X = 0 is plausible
Regression Line for Prediction • Example: What is the predicted life expectancy of a country with a GDP of 20? • Ŷx=20= 68.7 + (0.42)X = 68.7+(0.42)(20) = 77.12 • The regression line will always go through (x-bar, y-bar) which in this case is (21.5, 77.8) • To draw the regression line, connect any two points on the line x x
Coefficient of Determinationr2 Interpretation: proportion of the variability in Ymathematically explained by X Our example r =.809 r2 = .8092 = 0.66. Interpretation: 66% of the variability in Y (life expectancy) mathematically explained* by X (GDP) * mathematically explained ≠ causally explained
Cautions about linear regression • Applies to linear relationships only • Strongly influenced by outliers, especially when outlier is in the X direction • Do not extrapolate! • Association ≠ causation (Beware of lurking variables.)
Outliers / Influential Points • Outliers in the X direction have strong influence (tip the line) • Example (right) • Child 18 = outlier in X direction • Changes the slope substantially w/o outlier with outlier
Do Not Extrapolate! • Example (right): Sarah’s height from age 3 to 5 • Least squares regression line: ŷ= 2.32 + .159(X) • Predict height at age 30 • ŷ= 2.32 + .159(X) = 2.32 + .159(30) = 8.68’(ridiculous) • Do NOT extrapolate beyond the range of X
“Association”not the same as “causation” Lurking variable ≡ an extraneous factor (Z) that is associated with both X and Y Lurking variables can confound an association Association ≠ Causation
Explanatory variable X≡ number of prior children Response variable Y ≡ the risk of Down’s syndrome Lurking variable Z≡ advanced age of mother X is associated with Y, but does not cause Y in this example Z does cause Y Example of Confounding by a Lurking Variable Number of children Mental retardation Older mother
Criteria used to establish causality with examples about smoking (X) and lung cancer (Y) • Strength of association • X & Y strongly correlated • Consistency of findings • Many studies have shown X & Y correlated • Dose-response relationship • The more you smoke, the more you increase risk • Temporality (time relation) • Lung cancer occurs after 10 – 20 years of smoking • Biological plausibility • Chemical in cigarette smoke are mutagenic