270 likes | 407 Views
Chapter 5. Residuals, Residual Plots, Coefficient of determination, & Influential points. Residuals (error) -. The vertical deviation between the observations & the LSRL the sum of the residuals from the LSRL is always zero error = observed - expected. Residual plot .
E N D
Chapter 5 Residuals, Residual Plots, Coefficient of determination, & Influential points
Residuals (error) - • The vertical deviation between the observations & the LSRL • the sum of the residuals from the LSRL is always zero • error = observed - expected
Residual plot • A scatterplot of the (x, residual) pairs. • Residuals can be graphed against other statistics besides x • Purpose is to tell if a linear association exist between the x & y variables
60 64 68 Consider a population of adult women. Let’s examine the relationship between their height and weight. Weight Height
Residuals Weight 60 64 68 Height Suppose we now take a random sample from our population of women.
Residual plot • A scatterplot of the (x, residual) pairs. • Residuals can be graphed against other statistics besides x • Purpose is to tell if a linear association exist between the x & y variables • If no pattern exists between the points in the residual plot, then the association is linear.
Linear Not linear
Age Range of Motion 35 154 24 142 40 137 31 133 28 122 25 126 26 135 16 135 14 108 20 120 21 127 30 122 One measure of the success of knee surgery is post-surgical range of motion for the knee joint following a knee dislocation. Is there a linear relationship between age & range of motion? Graph the data and find the LSRL: Predicted range of motion = 107.58 + .87(age)
Age Range of Motion 35 154 24 142 40 137 31 133 28 122 25 126 26 135 16 135 14 108 20 120 21 127 30 122 Predicted range of motion = 107.58 + .87(age) Find the predicted y’s: Find the residuals:
One measure of the success of knee surgery is post-surgical range of motion for the knee joint following a knee dislocation. Is there a linear relationship between age & range of motion? Sketch a residual plot. x Residuals Age Range of Motion 35 154 24 142 40 137 31 133 28 122 25 126 26 135 16 135 14 108 20 120 21 127 30 122 Since there is no pattern in the residual plot, there is a linear relationship between age and range of motion
Residuals Age Range of Motion 35 154 24 142 40 137 31 133 28 122 25 126 26 135 16 135 14 108 20 120 21 127 30 122 Plot the residuals against the y-hats. How does this residual plot compare to the previous one?
Residuals Residuals x Residual plots are the same no matter if plotted against x or y-hat.
Coefficient of determination- • r2 • gives the approximate proportion of variation in y that can be attributed to a linear relationship between x & y • remains the same no matter which variable is labeled x
Interpretation of r2 Approximately r2% of the variation in y can be explained by the LSRL of x & y.
Age Range of Motion 35 154 24 142 40 137 31 133 28 122 25 126 26 135 16 135 14 108 20 120 21 127 30 122 How well does age predict the range of motion after knee surgery? Approximately 30.6% of the variation in range of motion after knee surgery can be explained by the linear regression of age and range of motion.
SSy = 1564.917 Age Range of Motion 35 154 24 142 40 137 31 133 28 122 25 126 26 135 16 135 14 108 20 120 21 127 30 122 Let’s examine r2. Suppose you were going to predict a future y but you didn’tknow the x-value. Your best guess would be the overall mean of the existing y’s. Sum of the squared residuals (errors) using the mean of y.
SSy = 1085.735 Age Range of Motion 35 154 24 142 40 137 31 133 28 122 25 126 26 135 16 135 14 108 20 120 21 127 30 122 Now suppose you were going to predict a future y but you DO know the x-value. Your best guess would be the point on the LSRL for that x-value (y-hat). Sum of the squared residuals (errors) using the LSRL.
SSy = 1564.917 SSy = 1085.735 Age Range of Motion 35 154 24 142 40 137 31 133 28 122 25 126 26 135 16 135 14 108 20 120 21 127 30 122 By what percent did the sum of the squared error go down when you went from just an “overall mean” model to the “regression on x” model? This is r2 – the amount of the variation in the y-values that is explained by the x-values.
Computer-generated regression analysis of knee surgery data: Predictor Coef Stdev T P Constant 107.58 11.12 9.67 0.000 Age 0.8710 0.4146 2.10 0.062 s = 10.42 R-sq = 30.6% R-sq(adj) = 23.7% Be sure to convert r2 to decimal beforetaking the square root! NEVER use adjusted r2! What is the equation of the LSRL? Find the slope & y-intercept. What are the correlation coefficient and the coefficient of determination?
In a regression setting, an outlier is a data point with a large residual Outlier –
Influential point- • A point that influences where the LSRL is located • If removed, it will significantly change the slope of the LSRL • Usually small residual (or 0)
Racket Resonance Acceleration (Hz) (m/sec/sec) 1 105 36.0 2 106 35.0 3 110 34.5 4 111 36.8 5 112 37.0 6 113 34.0 7 113 34.2 8 114 33.8 9 114 35.0 10 119 35.0 11 120 33.6 12 121 34.2 13 126 36.2 14 189 30.0 One factor in the development of tennis elbow is the impact-induced vibration of the racket and arm at ball contact. Sketch a scatterplot of these data. Calculate the LSRL & correlation coefficient. Does there appear to be an influential point? If so, remove it and then calculate the new LSRL & correlation coefficient.
(189,30) could be influential. Remove & recalculate LSRL Predicted acceleration = 42.37 - .06(resonance) r = -.775 r2 = 60.1%
(189,30) was influential since it moved the LSRL Predicted acceleration = 38.81 - .033(resonance) r = -.174 r2 = 3%
Which of these measures are resistant? • LSRL • Correlation coefficient • Coefficient of determination NONE – all are affected by outliers
Find the correlation coefficient and describe the relationship. r = .9861 There is a strong, positive, linear relationship between tuition and year at the UofA. Find the LSRL: Predicted tuition = 3821.26 + 311.45(year) Interpret the slope. For each 1 year increase, UA tuition goes up by an average of $311.45. Find the coefficient of determination. Interpret in context of problem. r2 = 97.2% 97.2% of the variation in tuition can be explained by the linear relationship between tuition and year at the UofA.
Make a residual plot of (x, residuals) and ( , residuals). Sketch and compare. x Linear not best model. Definite curved pattern in residual plot!