Prediction in Regression: Leaning Tower of Pisa Case Study

Stat 31, Section 1, Last Time • Linear Regression • Developed probability distributions (model) • Used for Inference • Computed with Excel • Needed Glossary of terms • Prediction • For new X, predict corresponding Y • Used point on line • Confidence Interval for mean of new Y • Prediction Interval for value of new Y

Prediction in Regression Revisit Class Example 28, (now 10.13 – 10.15) Old 10.8: Engineers made measurements of the Leaning Tower of Pisa over the years 1975 – 1987. “Lean” is the difference between a points position if the tower were straight, and its actual position, in tenths of a meter, in excess of 2.9 meters. The data are listed above…

Prediction in Regression Class Example 28, Old 10.9 (now 10.14) • In 1918 the lean was 2.9071 (the coded value is 71). Using the least squares equation for the years 1975 to 1987, calculate a predicted value for the lean in 1918 • Although the least squares line gives an excellent fit for 1975 – 1987, this did not extend back to 1918. Why? https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg28.xls

Prediction in Regression Class Example 28, Old 10.10 (now 10.15) • How would you code the explanatory variable for the year 2002? • The engineers working on the tower were most interested in how much it would lean if no corrective action were taken. Use the least squares equation line to predict the lean in 2005. https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg28.xls

Prediction in Regression Class Example 28, Old 10.10 (now 10.15) (c) To give a margin of error for the lean in 2005, would you use a confidence interval for the mean, or a prediction interval? Explain your choice. https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg28.xls

Prediction in Regression Class Example 28, Old 10.10 (now 10.15) • Give the values of the 95% confidence interval for the mean, and the 95% prediction interval. How do they compare? Recall generic formula (same for both):

Prediction in Regression Class Example 28, Old 10.10 (now 10.15) Difference was in form for SE: CI for mean: PI for value: https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg28.xls

Outliers in Regression Caution about regression: Outliers can have a major impact http://www.math.csusb.edu/faculty/stanton/m262/regress/regress.html • Single point can throw slope way off • And intercept too • Can watch for this, using plot • And residual plot show this, too

And Now for Something Completely Different Think about “Places Rated”… Idea: we live in a great place, but how great is it? • Objective comparisons? • Appears annually in several magazines • Get different answer? • Even in following years?

And Now for Something Completely Different Places Rated: • What drives differences? • There are several factors… • And they get different weights? • How different can these be???

And Now for Something Completely Different Places Rated: Google’s View • There are many websites • You can interact with them • And choose your own weights • How different can these be?

And Now for Something Completely Different Interesting Analysis: Analysis of Data from the Places Rated Almanac R. A. Becker, L. Denby, R. McGill, A. R. Wilks The American Statistician, 41, 169-186 Web available at: http://www.jstor.org/journals/astata.html

And Now for Something Completely Different Background: • From 1985 • 329 metropolitan areas • Considered 7 factors

And Now for Something Completely Different The factors:

And Now for Something Completely Different How good can we be? • Can we fudge the weights to win? • Who else can also win? • How many can place last?

And Now for Something Completely Different Main approach: Consider all possible sets of weights Some conclusions: • 143 cities can be best (over 1/3) • 150 could be last (almost ½) • 59 could be either 1st or last!?!

And Now for Something Completely Different Where are we in this? Weights for us to be best: No weights leaving us last….

And Now for Something Completely Different Main lessons: • Take such ratings with grain of salt • Be aware of other weightings • Ask about priority of weighter… • Ask about changes for other weights…

Nonlinear Regression Class Example 41: World Population https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg41.xls Main lessons: • Data can be non-linear • Identify with plot • Residuals even more powerful at this • Look for systematic structure

Nonlinear Regression Class Example 41: World Population https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg41.xls When data are non-linear: • There is non-linear regression • But not covered here • Can use lin. reg’n on transformed data • Log transform often useful

Prediction in Regression: Leaning Tower of Pisa Case Study

Prediction in Regression: Leaning Tower of Pisa Case Study

Presentation Transcript

Stat 31, Section 1, Last Time

Stat 31, Section 1, Last Time

Stat 31, Section 1, Last Time

Stat 155, Section 2, Last Time

Stat 155, Section 2, Last Time

Stat 155, Section 2, Last Time

Stat 155, Section 2, Last Time

Stat 31, Section 1, Last Time

Stat 31, Section 1, Last Time

Stat 31, Section 1, Last Time

Stat 31, Section 1, Last Time

Stat 31, Section 1, Last Time

Stat 31, Section 1, Last Time

Stat 31, Section 1, Last Time

Stat 155, Section 2, Last Time

Stat 155, Section 2, Last Time

Stat 31, Section 1, Last Time