200 likes | 283 Views
Chapter 4 Section 2. Least-Squares Regression. 1. 2. 3. Chapter 4 – Section 2. Learning objectives Find the least-squares regression line and use the line to make predictions Interpret the slope and the y-intercept of the least squares regression line
E N D
Chapter 4Section 2 Least-Squares Regression
1 2 3 Chapter 4 – Section 2 • Learning objectives • Find the least-squares regression line and use the line to make predictions • Interpret the slope and the y-intercept of the least squares regression line • Compute the sum of squared residuals
2 3 1 Chapter 4 – Section 2 • Learning objectives • Find the least-squares regression line and use the line to make predictions • Interpret the slope and the y-intercept of the least squares regression line • Compute the sum of squared residuals
Chapter 4 – Section 2 • If we have two variables X and Y, we often would like to model the relation as a line • Draw a line through the scatter diagram • We want to find the line that “best” describes the linear relationship … the regression line
Chapter 4 – Section 2 • We want to use a linear model • We want to use a linear model • Linear models can be written in several different (equivalent) ways • y = mx + b • y – y1 = m (x – x1) • y = b1x + b0 • We want to use a linear model • Linear models can be written in several different (equivalent) ways • y = m x + b • y – y1 = m (x – x1) • y = b1x + b0 • Because the slope and the intercept both are important to analyze, we will use y = b1x + b0
Chapter 4 – Section 2 • One difference between math and stat is that statistics assumes that the measurements are not exact, that there is an error or residual • One difference between math and stat is that statistics assumes that the measurements are not exact, that there is an error or residual • The formula for the residual is always Residual = Observed – Predicted • One difference between math and stat is that statistics assumes that the measurements are not exact, that there is an error or residual • The formula for the residual is always Residual = Observed – Predicted • This relationship is not just for this chapter … it is the general way of defining error in statistics
Chapter 4 – Section 2 • For example, say that we want to predict a value of y for a specific value of x • For example, say that we want to predict a value of y for a specific value of x • Assume that we are using y = 10 x + 25 as our model • For example, say that we want to predict a value of y for a specific value of x • Assume that we are using y = 10 x + 25 as our model • To predict the value of y when x = 3, the model gives us y = 10 3 + 25 = 55, or a predicted value of 55 • For example, say that we want to predict a value of y for a specific value of x • Assume that we are using y = 10 x + 25 as our model • To predict the value of y when x = 3, the model gives us y = 10 3 + 25 = 55, or a predicted value of 55 • Assume the actual value of y for x = 3 is equal to 50 • For example, say that we want to predict a value of y for a specific value of x • Assume that we are using y = 10 x + 25 as our model • To predict the value of y when x = 3, the model gives us y = 10 3 + 25 = 55, or a predicted value of 55 • Assume the actual value of y for x = 3 is equal to 50 • The actual value is 50, the predicted value is 55, so the residual (or error) is 50 – 55 = –5
The residual The model line The observed value y The predicted value y The x value of interest Chapter 4 – Section 2 • What the residual is on the scatter diagram
Chapter 4 – Section 2 • We want to minimize the residuals, but we need to define what this means • We want to minimize the residuals, but we need to define what this means • We use the method of least-squares • We consider a possible linear mode • We calculate the residual for each point • We add up the squares of the residuals • We want to minimize the residuals, but we need to define what this means • We use the method of least-squares • We consider a possible linear mode • We calculate the residual for each point • We add up the squares of the residuals • The line that has the smallestis called the least-squaresregressionline
Chapter 4 – Section 2 • The equation for the least-squares regression line is given by y = b1x + b0 • b1 is the slope of the least-squares regression line • b0 is the y-intercept of the least-squares regression line
Chapter 4 – Section 2 • Finding the values of b1 and b0, by hand, is a very tedious process • You should use software for this • Finding the values of b1 and b0, by hand, is a very tedious process • You should use software for this • Finding the coefficients b1 and b0 is only the first step of a regression analysis • We need to interpret the slope b1 • We need to interpret the y-intercept b0 • We need to do quite a bit more statistical analysis … this is covered in Section 4.3 and also in Chapter 14
1 3 2 Chapter 4 – Section 2 • Learning objectives • Find the least-squares regression line and use the line to make predictions • Interpret the slope and the y-intercept of the least squares regression line • Compute the sum of squared residuals
Chapter 4 – Section 2 • Interpreting the slope b1 • The slope is sometimes defined as as • The slope is also sometimes defined as as • The slope relates changes in y to changes in x
Chapter 4 – Section 2 • For example, if b1 = 4 • If x increases by 1, then y will increase by 4 • If x decreases by 1, then y will decrease by 4 • A positive linear relationship • For example, if b1 = 4 • If x increases by 1, then y will increase by 4 • If x decreases by 1, then y will decrease by 4 • A positive linear relationship • For example, if b1 = –7 • If x increases by 1, then y will decrease by 7 • If x decreases by 1, then y will increase by 7 • A negative linear relationship
Chapter 4 – Section 2 • For example, say that a researcher studies the population in a town (the y or response variable) in each year (the x or predictor variable) • To simplify the calculations, years are measured from 1900 (i.e. x = 55 is the year 1955) • For example, say that a researcher studies the population in a town (the y or response variable) in each year (the x or predictor variable) • To simplify the calculations, years are measured from 1900 (i.e. x = 55 is the year 1955) • The model used is y = 300 x + 12,000 • For example, say that a researcher studies the population in a town (the y or response variable) in each year (the x or predictor variable) • To simplify the calculations, years are measured from 1900 (i.e. x = 55 is the year 1955) • The model used is y = 300 x + 12,000 • A slope of 300 means that the model predicts that, on the average, the population increases by 300 per year
Chapter 4 – Section 2 • Interpreting the y-intercept b0 • Sometimes b0 has an interpretation, and sometimes not • If 0 is a reasonable value for x, then b0 can be interpreted as the value of y when x is 0 • If 0 is not a reasonable value for x, then b0 does not have an interpretation • Interpreting the y-intercept b0 • Sometimes b0 has an interpretation, and sometimes not • If 0 is a reasonable value for x, then b0 can be interpreted as the value of y when x is 0 • If 0 is not a reasonable value for x, then b0 does not have an interpretation • In general, we should not use the model for values of x that are much larger or much smaller than the observed values
Chapter 4 – Section 2 • For example, say that a researcher studies the population in a town (the y or response variable) in each year (the x or predictor variable) • To simplify the calculations, years are measured from 1900 (i.e. x = 55 is the year 1955) • For example, say that a researcher studies the population in a town (the y or response variable) in each year (the x or predictor variable) • To simplify the calculations, years are measured from 1900 (i.e. x = 55 is the year 1955) • The model used is y = 300 x + 12,000 • For example, say that a researcher studies the population in a town (the y or response variable) in each year (the x or predictor variable) • To simplify the calculations, years are measured from 1900 (i.e. x = 55 is the year 1955) • The model used is y = 300 x + 12,000 • An intercept of 12,000 means that the model predicts that the town had a population of 12,000 in the year 1900 (i.e. when x = 0)
1 2 3 Chapter 4 – Section 2 • Learning objectives • Find the least-squares regression line and use the line to make predictions • Interpret the slope and the y-intercept of the least squares regression line • Compute the sum of squared residuals
Chapter 4 – Section 2 • After finding the slope b1 and the intercept b0, it is very useful to compute the residuals, particularly • Again, this is a tedious computation • All the least-squares regression software would compute this quantity • We will use it in future sections
Summary: Chapter 4 – Section 2 • We can find the least-squares regression line that is the “best” linear model for a set of data • The slope can be interpreted as the change in y for every change of 1 in x • The intercept can be interpreted as the value of y when x is 0, as long as a value of 0 for x is reasonable