730 likes | 1.39k Views
Linear Regression. The Least squares Regression model. Regression Line. A regression line is a line that describes how a response variable y changes as an explanatory variable x changes. We often use regression to predict the value of y given an x value. Equation of a Regression Line.
E N D
Linear Regression The Least squares Regression model
Regression Line A regression line is a line that describes how a response variable y changes as an explanatory variable x changes. We often use regression to predict the value of y given an x value
Equation of a Regression Line • A regression line relating x to y has an equation: • yˆ (read y hat) is the predictor value of the response variable y for a given value of the explanatory variable x. • b is the slope, the amount y is expected to change when x increases one unit • a is the y intercept the predicted value of y when x=0
Prediction • Interpolation is the use of a regression line to predict between known observations • Extrapolation is the use of a regression line to predict outside known observations • predictions from extrapolation are often not accurate
Residuals • A residual is the difference between an observed value of the response variable and the value predicted by the regression line • Residual=observed y- predicted y • Residual =y-yˆ
Least Square Regression line • The least squares regression line of y on x is the line that makes the sum of the squared residuals as small as possible. • Equation:
How well does a line fit the data? Since residuals tell us how far the data is from the regression line they are a natural place to look for the fit. A residual plot is a scatterplot of the residuals against the explanatory variable.
How do residual plots help us assess the fit of the data? • A residual plot in effect turns the regression line horizontal • The Residual plots magnify the deviation of points from the line • Making it easier to see unusual observations and patterns
What we look for in residual plots • NO obvious pattern • A curved pattern shows a nonlinear relationship. • A megaphone pattern shows growth of residuals • The residuals should be relatively small • The typical prediction error.
The average prediction error The standard deviation of residuals (s)
Home example • We want to predict the price of a home in Arvada. A random sample of 10 homes for sell is taken. Thousand • Make a prediction for the cost of the 11th house if we know the square footage is 1789 ft2
Well here is what I would do • I would make a scatter plot. • Than I would find the linear regression • Finally, I would use the regression line to predict the cost. • Here’s what I found: • y=231.67+.33x • r=.87 Thus the price of the home • r2=.76 would be $353.47 thousand
D what is the r2 thing? • r2 is the coefficient of determination. • Yes I know it is r squared, but why do we bother?
More house example Now I am going to change one small thing in our house example, we don’t know the size of the 11th house. What would to predict the price to be now? I would predict the price to be $339.8 Not as good as our last prediction but not bad.
Explain v Unexplained variability • We would expect our linear regression model to predict the price better than the mean, but is it really that much different? • The sum of squares prediction errors if we use the mean is 70913.6 • This is the sum of squares of TOTAL variation SST • The sum of squares residuals is 16754.6 • This is the sum of square of the ERROR SSE
How SST and SSE make r2 • The ratio SSE/SST tells us how the proportion of variation in y still remaining. • SSE/SST=16754.6/70913.6 = .236 • Thus 23.6 % of the variation is unaccounted for in our model • Thus the percentage accounted for in our model is 1-.236= .764
HOLD ON Wasn’t r2=.76? Yes, it was. In fact we can calculate r2 by finding the ratio SSE/SST and subtracting it from 1. Thus, what r2 tells us is the amount of variability explained by the model