310 likes | 373 Views
Linear Regression. Essentials Line Basics y = mx + b vs. Definitions Scatter Plot & Regression Line Notation & Formulae Regression Considerations Line of Best Fit – Least Squares Line Example. E ssentials: Regression (Predictions based upon the known.).
E N D
Linear Regression • Essentials • Line Basics • y = mx + b vs. • Definitions • Scatter Plot & Regression Line • Notation & Formulae • Regression Considerations • Line of Best Fit – Least Squares Line • Example
Essentials:Regression(Predictions based upon the known.) • Understand what the regression process does - prediction. • Be able to state the steps we use leading up to the decision to conduct regression. • Be able to calculate the slope of a line and the y-intercept. • Be able to calculate a regression equation and apply it to the prediction of other values. Know that these are estimates, not necessarily the actual values that might occur. • Know what the Least Squares Property and Line of Best Fit. Residual – what’s that?
A Linear Equation in One Independent Variable y = mx + b b is the y-intercept (the point at which the line intersects the y-axis). It is the value of y when x = 0. y is the dependent variable (also called the response variable). Its value depends on the value of x. x is the independent variable (also known as the predictor variable.) m is the slope of the line. The slope indicates how much the y-value increases (or decreases if the slope is negative) when the x-value increases by 1 unit. When m is positive, the line will have an upward slope. When m is negative, the line will have a downward slope.
y 5 4 3 2 1 -4 -3 -2 -1 1 2 3 4 x -1 -2 -3 -4 -5
y 5 4 3 2 1 x -4 -3 -2 -1 1 2 3 4 -1 -2 -3 -4 -5 . . (-1, 4) (-2, 2) -x -y
y . 5 4 3 2 1 . y=mx+b y=2x+1 . . -4 -3 -2 -1 1 2 3 4 x -1 -2 -3 -4 -5 . .
The Regression Equation x is the independent variable (predictor variable) ^ y is the dependent variable (response variable) ^ Where: b0 = y intercept b1 = slope y = b0 +b1x (recall, y = mx +b )
^ y= b0 + b1x Regression Definitions Regression Equation Given a collection of paired data, the regression equation algebraically describes the relationship between the two variables Regression Line (line of best fit or least-squares line) The regression line is the graph of the regression equation
Always Look at a Scatterplot First You should be able to “see” a straight line being passed through the data points.
The Regression Line is calculated to minimize the distance of the line from the observed values.
Notation for Regression Equation y-intercept of regression equation 0b0 Slope of regression equation 1b1 Equation of the regression line y = 0 + 1x y = b0 + b1x1 Population Parameter Sample Statistic ^
Formulas for b0 and b1 Slope: y-intercept: NOTE: If you do not find b1 first, then b0 may be determined by:
The Regression Line ^ y = b0 +b1x • Fits the sample points best. • Distances between this line and the sample points are at a minimum.
When is it reasonable to do Regression Start by asking the following: Does it make sense to look at the relationship between these two variables? Does a scatter plot present a relationship (either positive or negative)? If yes to both, calculate r (the correlation). Is the correlation statistically significant? Yes - go on to regression No – best estimate becomes the mean of the y variable Conduct regression analysis (if yes above) Use the regression equation to calculate (estimate) a y-value given a specific x-value.
Predictions In predicting a value of y based on some given value of x ... 1. If there is not a significant linear correlation, the best predicted y-value is y. 2. If there is a significant linear correlation, the best predicted y-value is found by substituting the x-value into the regression equation.
Calculate the value of r and test the hypothesis that = 0 Use the regression equation to make predictions. Substitute the given value in the regression equation. Is there a significant linear correlation ? Given any value of one variable, the best predicted value of the other variable is its sample mean. Start Yes Predicting the Value of a Variable No
Guidelines for Using The Regression Equation • If there is no significant linear correlation, don’t use the regression equation to make predictions. • When using the regression equation for predictions, stay within the scope of the available sample data. • A regression equation based on old data is not necessarily valid now. • Don’t make predictions about a population that is different from the population from which the sample data was drawn.
Definitions • Marginal Change the amount a variable changes when the other variable changes by exactly one unit • Outlier a point lying far away from the other data points • Influential Points points which strongly affect the graph of the regression line
Residuals and the Least-Squares Property Definitions • Residual For a sample of paired (x,y) data, the difference (y - y) between an observed sample y-value and the value of y-hat, which is the value of y that is predicted by using the regression equation. • Least-Squares Property A straight line satisfies this property if the sum of the squares of the residuals is the smallest sum possible. ^
x 1 2 4 5 ^ y= 5 + 4x y 4 24 8 32 y • Residual = 7 32 30 28 26 • Residual = 11 24 22 20 18 16 14 12 10 • 8 Residual = -13 6 • Residual = -5 4 2 x 0 1 2 3 4 5 Residuals and the Least-Squares Property
Example : Orion Cars • Orion Cars: The age and price for a sample of 11 Orions are noted below. Calculate a correlation coefficient and , if appropriate, a regression equation for the relationship. Determine the value of cars that are 4.5 years and 10 years old. • CarAge (yrs.)Price ($100’s) • 1 5 85 • 2 4 103 • 3 6 70 • 4 5 82 • 5 5 89 • 6 5 98 • 7 6 66 • 8 6 95 • 9 2 169 • 10 7 70 • 11 7 48
Example : Orion Cars (Price in thousands)
Example : Orion Cars (Price in thousands)
Example : Orion Cars (Price in thousands)
With influential point Without influential point (Price in thousands)