E N D
Influential Points By Noelle Hodge
Does the age at which a child begins to talk predict later score on a test of mental ability? A study of the development of young children recorded the age in month at which each of the 21 children spoke their first word and Gesell Adaptive Score, the result of an aptitude test taken much later. The data appears below. Enter data into calculator, List 1 and List 2
Calculate the LSRL for the data. • Sketch a scatter plot with the LSRL
Is there a point that seems like an outlier in the y-direction? • Circle it. • Which child is it? Child 19
Is there a point that seems like an outlier in the x-direction? • Circle it. • Which Child is it? Child 18
Remove the point you chose for the outlier in the y-direction. • Sketch the scatter plot with the LSRL
What is different from this LSRL and the plots, than from the original? • . With point removed: Original:
Insert this data point back into your data. • STAT -> edit • 2nd -> DEL (Insert) -> (enter 17) • Curser over to y column -> 2nd -> DEL (Insert) -> (enter 121)
Remove the point you chose for the outlier in the x-direction. • Sketch the scatter plot with the LSRL
What is different from this LSRL and the plots than from the original? • . With point removed: Original:
Influential points • Influence depends on both leverage and residual; a case with high leverage whose y-value sits right on the line fit to the rest of the data is not influential. Removing that case won’t change the slope, even if it is does affect . A case with modest leverage but a very large residual can be influential. • If a point has enough leverage, it too, can pull the line right to it because it’s highly influential but has a small residual. • The only way to be sure is to fit both regressions.
Influential Points • Unusual points in a regression often tell us more about the data and the model than any other points. • Whenever you have influential points, you should fit the linear model to the other points alone and then compare the two regression models to understand how they differ.
Just Checking: • For each of the three scatter plots, tell whether the point indicated is a HIGH LEVERAGE POINT, would have a LARGE RESIDUAL, or IS INFLUENTIAL. Not high leverage, not influential, large residual High Leverage, influential, not large residual High Leverage, not influential, small residual 20 25 12 20 10 10 y y y 15 8 0 10 -10 6 5 4 -20 10 0 5 15 2 4 20 6 10 5 20 0 10 15 8 x x x