1 / 14

Linear Regression

Linear Regression. Least Squares Method: the Meaning of r 2. We are given the following ordered pairs: (1.2,1), (1.3,1.6), (1.7,2.7), (2,2), (3,1.8), (3,3), (3.8,3.3), (4,4.2). They are shown in the scatterplot below:.

perrybrown
Download Presentation

Linear Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linear Regression Least Squares Method: the Meaning of r2

  2. We are given the following ordered pairs: (1.2,1), (1.3,1.6), (1.7,2.7), (2,2), (3,1.8), (3,3), (3.8,3.3), (4,4.2). They are shown in the scatterplot below:

  3. Now we show the line . . This is the mean of the y values.

  4. The line segments show the deviations from the mean for each data point.

  5. The squares of the deviations are shown geometrically. Squaring has the consequence of making each difference positive. The greater the variation in y, the larger are the squares. If the y values are close together, the squares will be small.

  6. This is the geometric representation of the sum of the squares from the previous slide.

  7. Now the best fit line is shown.

  8. The directed distance is called the residual. For each point this is the difference between the actual y value and the predicted y value. As with deviations, some residuals are positive, some are negative. Together they add to zero.

  9. This graph gives a geometric representation of the squares of the residuals. As with the squares of the deviations this produces all positive quantities.

  10. This is the geometric representation of the sum of the squares of the residuals. This quantity is minimized in the least squares method of linear regression. We use the line that produces the smallest sum of the squares of the residuals.

  11. We now see both the squares of the deviations from the mean (green squares) and squares of residuals (red squares).

  12. This geometric representation of the sum of the squares of the residuals (in red) shows that this quantity is a portion of the total sum of the squares of the deviations (imagine the entire green square). All of the variation in y is represented by the larger green square, and the part that is not explainable by the regression equation is in red. The green square is called SST, the sum of the squares, total, about the mean. The red square is called SSE, the sum of the squares of the error about the line. SST SSE

  13. If we now measure the quantities SSE and SST, we can make a useful calculation, the coefficient of determination, or r2. Recall that SST is the total sum of the squares of the deviations about the mean value of y, and SSE is the sum of the squares of the error (residuals) about the line. n.b. This r2 is the square of the correlation coefficient.

  14. In our example, SST=7.6 and SSE= 2.44. Therefore, This means that 68% of the variation in y is explained by the regression line. The meaning of r2is extremely important in Statistics.

More Related