1 / 27

Lecture 9: Diagnostics & Review

Lecture 9: Diagnostics & Review. February 10, 2014. Question. A least squares regression line is determined from a sample of values for variables x and y where x = size of a listed home (in sq feet) y = selling price of the home (in $)

elden
Download Presentation

Lecture 9: Diagnostics & Review

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 9:Diagnostics & Review February 10, 2014

  2. Question A least squares regression line is determined from a sample of values for variables x and y where x = size of a listed home (in sq feet) y = selling price of the home (in $) Which of the following is true about the model b0 + b1x? • If there is positive correlation r between x and y, then b1 must be positive • The units of the intercept and slope will be the same as the response variable, y. • If r2 = 0.85, then it is appropriate to conclude that a change in x will cause a change in y • None of the above, more than one of the above, or not enough information to tell.

  3. Question A least squares regression line is determined from a sample of values for variables x and y where x = size of a listed home (in sq feet) y = selling price of the home (in $) Which of the following is true about the model b0 + b1x? • If there is positive correlation r between x and y, then b1 must be positive b1 = r * sy / sx So if r> 0, then b1 is positive because syand sx> 0

  4. Administrative • Problem set 4 due (9am) • How was it? • Next week: Multiple Regression • Exam Wednesday • Sample question • Taken from Exam 1 - #37 last year

  5. Last time • What did we talk about? • Outliers • Sensitivity analysis • Heteroscedasticity

  6. Common problems and fixes: Say we’re estimating price of a lease by the size of the house: Price = β0 + β1 * SqFt + ε Interpretation of the estimates? • β0would be fixed costs and • β1would be marginal costs

  7. Common Problems:Heteroscedasticity Heteroscedasticity: What does that mean for your analysis? • Point estimates for β’s? • Still OK. No bias. • Prediction and Confidence intervals? • Not reliable; too narrow or too wide. • Hypothesis tests regarding β0 and β1 are not reliable.

  8. Common Problems:Heteroscedasticity Fixing the problem: • Revise the model: how will depend on the substance. • Try revising the model to estimate Price/SqFt by dividing the original eq by SqFt: • Notice the change in the • intercept and slope: • Don’t be locked into thinking the intercept is fixed cost • How to interpret them depends • Think about the data!

  9. Common Problems:Heteroscedasticity Fixing the problem: Price/SqFt = M + F * (1/SqFt) + ε • Revise by thinking about the substance • Here it was predict price per sqft directly. • Don’t revise by doing weird things • Use theory! • After revising, check if the residuals have similar variances? • Sometimes they won’t. • In this case they do:

  10. Common Problems:Heteroscedasticity Comparing the revised and original model: • Revised model may have different (and smaller) R2. • Again, so? R2 is great but it’s only one notion of fit. • In the example, the revised model provides a narrower (hence better) confidence interval for fixed and variable costs: Original Model Revised Model Original Model Revised Model

  11. Common Problems:Heteroscedasticity Comparing the revised and original model: • It also provides a more sensible prediction interval • The data originally indicated that large homes varied in price more:

  12. Common Problems:Heteroscedasticity How do you know how to remodel the problem? • Practice • Creativity; try different things. • There is no magic bullet; sometimes you can’t.

  13. Common Problems:Correlated Errors Problem: Dependence between residuals (autocorrelation) • The amount of error (detected by the size of the residual) you make at observation x+ 1 is related to the amount of error you make at observation x. • Why is this a problem? • SRM assumes that the errors, ε, are independent. • Common problem for time series data, but not just a time series problem. • Recall the u-shaped pattern in one of the residual plots before

  14. Common Problems:Correlated Errors Detecting the problem: • Easier with time series data: • plot the residuals versus time and look for a pattern (is t+1 related to t?). Not guaranteed to find it but often helpful. • Use the Durbin-Watson statistic to test for correlation between adjacent residuals (aka serial- or auto-correlation) • With time series data adjacency is temporal. • In non time series data, we’re still talking about errors next to one another being related. • For things like spatial autocorrelation, there are more advanced things like mapping the residuals and tests we can do

  15. Durbin-Watson Statistic • Tests to see if the correlation between the residuals is 0 • Null hypothesis: H0: ρε = 0 • It’s calculated as: • From the Durbin-Watson, D,statistic and sample size you can calculate the p-value for the hypothesis test • You’ll see this more in multiple regression and forecasting

  16. Common Problems:Correlated Errors Consequences of Dependence: • With autocorrelation in the errors the estimated standard errors are too small • Estimated slope and intercept are less precise than as indicated by the output

  17. Common Problems:Correlated Errors How do you fix it? • Try to model it directly or transform the data. • Example: number of mobile phone users: • Growth rate isn’t linear; try different transformations Original data Transformed data

  18. Common Problems:Correlated Errors Does this fix the problem? • Linear pattern looks better • You still need to check the other SRM conditions!! • Omitted variables? • Analysis of residuals. Might still be a problem. Original data Transformed data

  19. Exam Review • Download diamonds.xlsx • Regress price on weight • Are the residuals distributed Normal? • Yes • No • Maybe? • I have no idea how to verify that

  20. Exam Review • Using your regression model from the last slide, predict the price of a diamond that weighs 0.44 carats • What is the approximate 95% confidence interval? • [$877.75, $1558.61] • [$2324.80, $3014.69] • [$-97.97, $184.95] • [$2330.41, $3009.09] • I have no idea

  21. Exam Review • Using your regression model from the last slide, predict the price of a diamond that weighs 0.28 carats • What is the prediction interval? • [$877.75, $1558.61] • [$452.57, $1129.46] • [$764.38, $1058.25] • [$345.61, $678.34] • I have no idea

  22. Exam Review • Question about transformations: • Again, no magic bullet. Try different ones. • How do you decide if you transform the X or Y? • Often depends on the substance.

  23. Exam Review • Transformations • A common mistake is to forget to convert back to the appropriate units. • Say your data and interest is in km/l and you transform the response to be liters / 100 km. Don’t forget to transform back to the correct units. Similarly for ln(x) [ in excel e is =exp() ]

  24. Exam Review • Conditions for the SRM • Know them. • Don’t be hesitant to try to fit a model if they are violated; just be cautious. • Some of you might think a regression model is inappropriate if you don’t see a pattern in the data, i.e.,: • Totally fine to try to fit a model • The slope will probably be 0.

  25. Exam Review Check list: • Is the association between y and x linear? • Maybe one could exist but you don’t obviously see it (much more common in multiple regression) • Have omitted/lurking variables been ruled out? • In the exam, I’ll try to give you the necessary info. • Are the errors evidently independent? • How do you verify this? • Are the variances of the residuals similar? • How do you verify this? • Are the residuals nearly normal? • How do you verify this?

  26. Exam Review • What do you need to know? • Everything from chapters 19 through 22… • No CAPM; we’ll come back to it. • What do you need to know from last semester? • Statistics builds on itself. I’ll assume you’re comfortable with some basic concepts (confidence intervals, hypothesis tests, z-scores, means, etc., etc.) • Will there be decision problems like those on Quiz 1? Maybe, but probably not. I want this to be more applied data analysis.

  27. Exam Review • Types of Questions? • Possibly homework like. • Some business related decision making • Some non-business related analysis • Best way to study? • Do the problems. Then do more.

More Related