170 likes | 502 Views
Regression. Regression. Correlation measures the strength of the linear relationship Great! But what is that relationship? How do we describe it? regression, regression line, regression equation Regression line is used for prediction. Predicting weights from heights.
E N D
Regression • Correlation measures the strength of the linear relationship • Great! But what is that relationship? How do we describe it? • regression, regression line, regression equation • Regression line is used for prediction
Predicting weights from heights • Independent variable: height • Dependent variable: weight • How can we predict one from the other ? • Regression is to a scatter plot as the mean is to a histogram.
70000 60000 50000 40000 30000 SALARY 20000 -5 0 5 10 15 20 25 30 YRS EM Salary by years employed
Regression by local averages Approximation of Local averages by regression line Inappropriate use of regression line (use other methods)
The equation of a line • a represents the y-intercept • when x equals zero, y equals a • Is this always meaningful in the context of a problem? • Is it always useful in defining a line? • b represents the slope of the line (rise/run) • for every unit change in x, y changes by b. • Does this mean that if we physically change x by one unit, y will change by b units? Say we gain another year of experience. Will our salary go up by 1107?
Regression equation • What is the predicted weight of somebody whose height is h cm ? • w = intercept + slope x h • This is known as the regression equation. • How do we get this formula ? • We have a statistical model
A residual Regression line by minimising residual errors • ei = error of i-th obs from • regression line • The best candidate line will • minimise these errors • No line can make all errors vanish (some +ve, some –ve)
Regression and correlation • Want to predict weight for those people who are 1 SD more than avg. height. • SD line says: • pred. wt. = overall avg. wt. + SD of wt. • Regression line says: • Predicted wt. = overall avg. wt. + r x SD of wt. • For people who are k SDs away from avg. height: • Predicted wt. = overall avg. wt. + r x kSD of wt. • Clearly valid for r 0 or r 1
RMS error of regression • RMS error = SD of y • RMS inversely related to correlation RMS error is to regression what SD is to average
Residuals residual = observed -predicted
Example: ozone vs. temperature > air[,c(1,3)] ozone temperature 3.4567 3.30 72 2.2974 2.62 62 2.84 65 . . . > cor(ozone,temperature) [1] 0.7531038
Fitting a regression model in S > ozone.lm <- lm(ozone ~ temperature, data = air) Coefficients: . Value Std. Error tvalue Pr(>|t|) (Intercept) -2.230.46 -4.820.0000 temperature 0.070.0111.95 0.0000 Multiple R-Squared: 0.5672 > var(ozone) [1] 0.7928069 > var(resid(ozone.lm)) [1] 0.3431544 > cor(ozone,temperature) [1] 0.7531038
Checking model appropriateness What assumptions have we made in the regression model ? Checking model assumptions in S-plus > par(mfrow=c(2,3)) > plot(ozone.lm)
Extrapolation Beware of extrapolation Pizza party at the Frat. • How many laps would you predict a pledge could run if he ate 6 slices of pizza? • How many laps if he ate 9 slices of pizza? • A pledge shows off and eats 35 slices of pizza. How many laps would you predict he would run?