160 likes | 313 Views
15: Linear Regression. Expected change in Y per unit X. Introduction (p. 15.1). X = independent (explanatory) variable Y = dependent (response) variable Use instead of correlation when distribution of X is fixed by researcher (i.e., set number at each level of X )
E N D
15: Linear Regression Expected change in Y per unit X
Introduction (p. 15.1) • X = independent (explanatory) variable • Y = dependent (response) variable • Use instead of correlation when distribution of X is fixed by researcher (i.e., set number at each level of X) studying functional dependency between X and Y
Illustrative data (bicycle.sav) (p. 15.1) • Same as prior chapter • X = percent receiving reduce or free meal (RFM) • Y = percent using helmets (HELM) • n = 12 (outlier removed to study linear relation)
How formulas determine best line(p. 15.2) • Distance of points from line = residuals (dotted) • Minimizes sum of square residuals • Least squares regression line
Formulas for Least Squares Coefficients with Illustrative Data (p. 15.2 – 15.3) SPSS output:
Interpretation of Slope (b)(p. 15.3) • b = expected change in Y per unit X • Keep track of units! • Y = helmet users per 100 • X = % receiving free lunch • e.g., b of –0.54 predicts decrease of 0.54 units of Y for each unit X
Predicting Average Y • ŷ = a + bx Predicted Y = intercept + (slope)(x) HELM = 47.49 + (–0.54)(RFM) • What is predicted HELM when RFM = 50? ŷ = 47.49 + (–0.54)(50) = 20.5 Average HELM predicted to be 20.5 in neighborhood where 50% of children receive reduced or free meal • What is average Y when x = 20? ŷ = 47.49 +(–0.54)(20) = 36.7
Confidence Interval for Slope Parameter(p. 15.4) 95% confidence Interval for ß = where b = point estimate for slope tn-2,.975 = 97.5th percentile (from t table or StaTable) seb = standard error of slope estimate (formula 5) standard error of regression
Illustrative Example (bicycle.sav) • 95% confidence interval for = –0.54 ± (t10,.975)(0.1058) = –0.54 ± (2.23)(0.1058) = –0.54 ± 0.24 = (–0.78, –0.30)
Interpret 95% confidence interval Model: Point estimate for slope (b) = –0.54 Standard error of slope (seb) = 0.24 95% confidence interval for = (–0.78, –0.30) Interpretation: slope estimate = –0.54 ± 0.24 We are 95% confident the slope parameter falls between –0.78 and –0.30
Significance Test(p. 15.5) • H0: ß = 0 • tstat (formula 7) with df = n – 2 • Convert tstat to p value df =12 – 2 = 10 p = 2×area beyond tstat on t10 Use t table and StaTable
Regression ANOVA(not in Reader & NR) • SPSS also does an analysis of variance on regression model • Sum of squares of fitted values around grand mean = (ŷi – ÿ)² • Sum of squares of residuals around line = (yi– ŷi )² • Fstat provides same p value as tstat • Want to learn more about relation between ANOVA and regression? (Take regression course)
Distributional Assumptions(p. 15.5) • Linearity • Independence • Normality • Equal variance
Validity Assumptions(p. 15.6) • Data = farr1852.sav X = mean elevation above sea level Y = cholera mortality per 10,000 • Scatterplot (right) shows negative correlation • Correlation and regression computations reveal: r = -0.88 ŷ = 129.9 + (-1.33)x p = .009 • Farr used these results to support miasma theory and refute contagion theory • But data not valid (confounded by “polluted water source”)