1 / 55

Least Squares Regression

Least Squares Regression. Fitting a Line to Bivariate Data. Automating Least Squares Line and Related Calculations. Excel: In text: see p. 200-201

hicksb
Download Presentation

Least Squares Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Least Squares Regression Fitting a Line to Bivariate Data

  2. Automating Least Squares Line and Related Calculations • Excel: • In text: see p. 200-201 • Excel has extensive capabilities related to least squares lines; in the Excel help search line type terms such as: slope, intercept, trendline, and regression for more information. • Statcrunch • In the left panel of our class webpage http://www.stat.ncsu.edu/people/reiland/courses/st311/ click on Student Resources, in “Statcrunch Instructional Videos” see “Scatterplots and Regression”; in “Many Statcrunch Instructional Videos” see videos 16, 19, 24, 48, and 48 (these numbers may change as more videos are added to this YouTube site). • TI calculator: • In the left panel of our class webpage click on Student Resources; under “Graphing Calculators, Online Calculations”, either • click on TI Graphing Calculator Guide and see p. 7-9, or • click on Online Graphing Calculator Tutorials

  3. Avg. occupants per car 1980: 6/car 1990: 3/car 2000: 1.5/car By the year 2010 every fourth car will have nobody in it! Food for Thought Kind of mathematical relationship between year and avg. no. of occupants per car? Why might relation- ship break down by 2010? Linear Relationships

  4. Basic Terminology • Scatterplots, correlation: interested in association between 2 variables (assign x and y arbitrarily) • Least squares regression: does one quantitative variable explain or cause changes in another variable?

  5. Basic Terminology (cont.) • Explanatory variable: explains or causes changes in the other variable; the x variable. (independent variable) • Response variable: the y -variable; it responds to changes in the x - variable. (dependent variable)

  6. Examples • Fertilizer (x ) corn yield (y ) • Advertising $ (x ) store income (y ) • Drug dose (x ) blood pressure (y ) • Daily temperature (x ) natural gas demand (y ) • change in min wage(x) unemployment rate (y)

  7. Simplest Relationship • Simplest equation that describes the dependence of variable y on variable x y = b0 + b1x • linear equation • graph is line with slope b1 and y-intercept b0

  8. Graph y=b0 +b1x y rise Slope b=rise/run b0 run 0 x

  9. Notation • (x1, y1), (x2, y2), . . . , (xn, yn) • draw the line y= b0 + b1x through the scatterplot , the point on the line corresponding to xi is

  10. Observed y, Predicted y predicted y when x=2.7 yhat = a + bx = a + b*2.7 2.7

  11. Scatterplot: Fuel Consumption vs Car Weight “Best” line?

  12. Scatterplot with least squares prediction line

  13. How do we draw the line? Residuals

  14. Residuals: graphically

  15. Criterion for choosing what line to draw: method of least squares • The method of least squares chooses the line that makes the sum of squares of the residuals as small as possible • This line has slope b1 and intercept b0 that minimizes

  16. Least Squares Line y = b0 + b1x: Slope b1 and Intercept b0

  17. Example: Income vs Consumption Expenditure

  18. Questions • Construct scatterplot; determine if linear model is appropriate. If so … • … find the least squares prediction line • Estimate consumption expenditure in a household with an income of (i) $6,000 (ii) $25,000. Comfortable with estimates? • Compute the residuals

  19. Scatterplot

  20. Solution

  21. Calculations

  22. least squares prediction line

  23. Least Squares Prediction Line

  24. Consumption Expenditure Prediction When x=$6,000 7.4 6

  25. Consumption Expenditure Prediction When x=$25,000 11.2 25

  26. The least squares line always goes through the point with coordinates (x, y) ( x, y ) = ( 9, 8 )

  27. C. Compute the Residuals

  28. Residuals

  29. Income Residual Plot

  30. Sresiduals, S(residuals)2 • Note that • Sresiduals = 0 • S(residuals)2 = 3.6 • From formula on slide 15: SSE=yi2 – b0*yi – b1*xiyi 330 – 6.2*40 - .2*392 = 330 – 248 – 78.4 = 3.6 Any other line drawn through the scatterplot will have S(residuals)2 > 3.6

  31. Car Weight, Fuel Consumption Example, cont. (xi, yi): (3.4, 5.5) (3.8, 5.9) (4.1, 6.5) (2.2, 3.3) (2.6, 3.6) (2.9, 4.6) (2, 2.9) (2.7, 3.6) (1.9, 3.1) (3.4, 4.9)

  32. col. sum

  33. Calculations

  34. Scatterplot with least squares prediction line

  35. The Least Squares Line Always goes Through ( x, y ) (x, y ) = (2.9, 4.39)

  36. Using the least squares line for prediction. Fuel consumption of 3,000 lb car? (x=3)

  37. Be Careful! Fuel consumption of 500 lb car? (x = .5) x = .5 is outside the range of the x-data that we used to determine the least squares line

  38. Avoid GIGO! Evaluating the least squares line • Create scatterplot. Approximately linear? • Calculate r2, the square of the correlation coefficient • Examine residual plot

  39. r2 : The Variation Accounted For • The square of the correlation coefficient r gives important information about the usefulness of the least squares line

  40. r2: important information for evaluating the usefulness of the least squares line -1 ≤ r ≤ 1 implies 0 ≤ r2 ≤ 1 The square of the correlation coefficient, r2, is the fraction of the variation in y that is explained by the least squares regression of y on x. The square of the correlation coefficient, r2, is the fraction of the variation in y that is explained by the variation in x.

  41. Example: car weight, fuel consumption • x=car weight, y=fuel consumption r2 = (.9766)2  .95 About 95% of the variation in fuel consumption (y) is explained by the linear relationship between car weight (x) and fuel consumption (y). • What else affects fuel consumption? • Driver, size of engine, tires, road, etc.

  42. Example: SAT scores

  43. SAT scores: calculations

  44. SAT scores: result r2 = (-.868)2 = .7534 If 57% of NC seniors take the SAT, the predicted mean score is

  45. Avoid GIGO! Evaluating the least squares line • Create scatterplot. Approximately linear? • Calculate r2, the square of the correlation coefficient • Examine residual plot

  46. Residuals • residual =observed y - predicted y = y - y • Properties of residuals • The residuals always sum to 0 (therefore the mean of the residuals is 0) • The least squares line always goes through the point (x, y)

  47. Graphicallyresidual = y - y y yi yi ei=yi - yi X xi

  48. Residual Plot • Residuals help us determine if fitting a least squares line to the data makes sense • When a least squares line is appropriate, it should model the underlying relationship; nothing interesting should be left behind • We make a scatterplot of the residuals in the hope of finding… NOTHING!

  49. Car Wt/ Fuel Consump: Residuals • CAR WT. FUEL CONSUMP. Pred FUEL CONSUMP. Residuals • 3.4 5.5 5.2094980690 .290501931 • 3.8 5.9 5.865096525 0.034903475 • 4.1 6.5 6.356795367 0.143204633 • 2.2 3.3 3.242702703 0.057297297 • 2.6 3.6 3.898301158 -0.29830115 • 2.9 4.6 4.39 0.21 • 2 2.9 2.914903475 -0.01490347 • 2.7 3.6 4.062200772 -0.46220077 • 1.9 3.1 2.751003861 0.348996139 • 3.4 4.9 5.209498069 -0.309498069

  50. Example: Car wt/fuel consump. residual plot

More Related