340 likes | 427 Views
Chapters 8, 9, 10 Linear Regression. Fitting a Line to Bivariate Data. Basic Terminology. Explanatory variable : explains or causes changes in the other variable; the x variable. (independent variable)
E N D
Chapters 8, 9, 10Linear Regression Fitting a Line to Bivariate Data
Basic Terminology • Explanatory variable: explains or causes changes in the other variable; the x variable. (independent variable) • Response variable: the y -variable; it responds to changes in the x - variable. (dependent variable)
Simplest Relationship • Simplest equation that describes the dependence of variable y on variable x y = b0 + b1x • linear equation • graph is line with slope b1 and y-intercept b0
Graph y=b0 +b1x y rise Slope b=rise/run b0 run 0 x
Notation • (x1, y1), (x2, y2), . . . , (xn, yn) • draw the line y= b0 + b1x through the scatterplot , the point on the line corresponding to xi is
Observed y, Predicted y predicted y when x=2.7 = b0 + b1x = b0 + b1*2.7 2.7
Scatterplot: Fuel Consumption vs Car Weight “Best” line?
Criterion for choosing what line to draw: method of least squares • The method of least squares chooses the line that makes the sum of squares of the residuals as small as possible • This line has slope b1 and intercept b0 that minimizes
Car Weight, Fuel Consumption Example, cont. (xi, yi): (3.4, 5.5) (3.8, 5.9) (4.1, 6.5) (2.2, 3.3) (2.6, 3.6) (2.9, 4.6) (2, 2.9) (2.7, 3.6) (1.9, 3.1) (3.4, 4.9)
The Least Squares Line Always goes Through ( x, y ) (x, y ) = (2.9, 4.39)
Using the least squares line for prediction. Fuel consumption of 3,000 lb car? (x=3)
Be Careful! Fuel consumption of 500 lb car? (x = .5) x = .5 is outside the range of the x-data that we used to determine the least squares line
Avoid GIGO! Evaluating the least squares line • Create scatterplot. Approximately linear? • Calculate r2, the square of the correlation coefficient • Examine residual plot
r2 : The Variation Accounted For • The square of the correlation coefficient r gives important information about the usefulness of the least squares line
r2: important information for evaluating the usefulness of the least squares line -1 ≤ r ≤ 1 implies 0 ≤ r2 ≤ 1 The square of the correlation coefficient, r2, is the fraction of the variation in y that is explained by the least squares regression of y on x. The square of the correlation coefficient, r2, is the fraction of the variation in y that is explained by differences in x.
March Madness: S(k) Sagarin rating of kth seeded team; Yij =Vegas point spread between seed i and seed j, i<j 94.8% of the variation in point spreads is explained by the variation in Sagarin rating differences
SAT scores: result r2 = (-.86845)2= .7542 Approx. 75.4% of the variation in mean SAT math scores is explained by differences in the percent of seniors taking the SAT.
Avoid GIGO! Evaluating the least squares line • Create scatterplot. Approximately linear? • Calculate r2, the square of the correlation coefficient • Examine residual plot
Residuals • residual =observed y - predicted y = y - y • Properties of residuals • The residuals always sum to 0 (therefore the mean of the residuals is 0) • The least squares line always goes through the point (x, y)
Graphicallyresidual = y - y y yi yi ei=yi - yi X xi
Residual Plot • Residuals help us determine if fitting a least squares line to the data makes sense • When a least squares line is appropriate, it should model the underlying relationship; nothing interesting should be left behind • We make a scatterplot of the residuals in the hope of finding… NOTHING!
Residuals: Sagarin Ratings and Point Spreads Yij Predicted Yij Residuals 20 23.48573586 -3.485735859 24 21.3717734 2.628226598 18 13.96719139 4.032808608 11 11.52185104 -0.521851036 6 5.774158519 0.225841481 8.5 7.613877198 0.886122802 4 1.683355495 2.316644505 4 2.186135755 1.813864245 28 27.26801463 0.731985367 16 15.53266629 0.467333708 11.5 10.56199781 0.938002187 12 10.11635167 1.883648327 4 5.397073324 -1.397073324 7 6.836853159 0.163146841 -1.5 1.500526309 -3.000526309 2 1.946172449 0.053827551 Yij Predicted Yij Residuals 25 23.58857728 1.411422725 18.5 18.34366502 0.156334982 10.5 12.85878945 -2.358789455 11.5 10.95050983 0.549490168 4.5 2.597501422 1.902498578 5 6.631170326 -1.631170326 4 3.203123099 0.796876901 -3.5 0.095026946 -3.595026946 23 24.15991848 -1.15991848 20.5 21.24607834 -0.746078337 18 20.0919691 -2.091969104 10.5 11.62469245 -1.124692453 9 6.836853159 2.163146841 7 5.979841353 1.020158647 2 3.283110867 -1.283110867 5 6.745438567 -1.745438567