80 likes | 241 Views
Summing Up. October 2011. Extrapolation Problems. What happens if we assume that the linear model continues outside of the range of the predictors? Example in book page 201: Predictor: Years after 1890 ( year ) Response: Median age of men at marriage ( age ). R 2 value = .926.
E N D
Summing Up October 2011
Extrapolation Problems • What happens if we assume that the linear model continues outside of the range of the predictors? • Example in book page 201: • Predictor: Years after 1890 (year) • Response: Median age of men at marriage (age). • R2 value = .926
What does tell us? • First off, we notice that the median age of marriage at 1890 was around 25.7. • Over 50 years, the median age would have dropped 2 years. • If this continued through 2000, the median age would have been around 21.3. • In 2000, the median age of first marriage was 27, not 21.3. What happened?
Perils of Extrapolation • The conditions that persisted in the first part of the twentieth century changed. The linear relationship stopped being meaningful. • Past results do not guarantee future success. • Bear Stearns survived the stock market crash of 1929. • It did not survive the subprime mortgage crisis of the mid 2000’s, though and was sold to JP Morgan at $10/share, down from $133/share at its high point.
Perils of Causation • Which of the following is more likely? • Michael and Lindsey’s marriage seemed very happy. One day he came home from work and killed her. • Michael and Lindsey’s marriage seemed very happy. One day he came home from work and killed her in order to collect on an insurance policy. The correct answer is #1.
The Narrative Fallacy • People do not look at data in a vacuum. When we look at data, we want it to tell us a story. We construct a narrative. • That narrative is based in fact but it is not the truth. • We want to tell a story. Revisiting the tragic ballad of Michael and Lindsey’s marriage - Why is the correct answer #1? • #1 encompasses #2 and all other explanations. • It is possible that he killed her for insurance, but there are other possible explanations.
Leverage or Influential Points • Sometimes, a single point can have an undue influence on the shape of a linear model. • A data point is said to have a high leverage if it is far away from the mean of the predictor variable (x). • A data point is influential if omitting it from a linear model would greatly change the model. • Identify influential points by their high residual. • A point with high leverage is not necessarily influential, especially if its residual is low.
Lurking Variables (look out behind you) • When we go forward with our unit on experimental design, the concept of a lurking variable will be reintroduced. • A lurking variable is a variable that may have an impact on two other variables. It can cause two unrelated variables to exhibit correlation. • Consider lurking variables when you interpret a linear model.