1 / 0

Regression Models

Regression Models. Professor William Greene Stern School of Business IOMS Department Department of Economics. Regression and Forecasting Models. Part 3 – Model Fit and Correlation. Correlation and Linear Association. Height (inches) and Income ($/mo.) in first post-MBA

anitra
Download Presentation

Regression Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regression Models

    Professor William Greene Stern School of Business IOMS Department Department of Economics
  2. Regression and Forecasting Models

    Part 3 – Model Fit and Correlation
  3. Correlation and Linear Association Height (inches) and Income ($/mo.) in first post-MBA Job (men). WSJ, 12/30/86. Ht. Inc. Ht. Inc. Ht. Inc. 70 2990 68 2910 75 3150 67 2870 66 2840 68 2860 69 2950 71 3180 69 2930 70 3140 68 3020 76 3210 65 2790 73 3220 71 3180 73 3230 73 3370 66 2670 64 2880 70 3180 69 3050 70 3140 71 3340 65 2750 69 3000 69 2970 67 2960 73 3170 73 3240 70 3050 Correlation = 0.845
  4. Correlation Coefficient for Two Variables
  5. Correlation and Linear Association Standard Deviation Height = 2.978Standard Deviation Income = 176.903Covariance of Height and Income = 445.034 Correlation = 445.034 / (2.978 x 176.903) = 0.845 Height (inches) and Income ($/mo.) in first post-MBA Job (men). WSJ, 12/30/86. Ht. Inc. Ht. Inc. Ht. Inc. 70 2990 68 2910 75 3150 67 2870 66 2840 68 2860 69 2950 71 3180 69 2930 70 3140 68 3020 76 3210 65 2790 73 3220 71 3180 73 3230 73 3370 66 2670 64 2880 70 3180 69 3050 70 3140 71 3340 65 2750 69 3000 69 2970 67 2960 73 3170 73 3240 70 3050
  6. Sample Correlation Coefficients rxy = -.06 (close to 0) rxy = 0.723 rxy = +1.000 rxy = -.402
  7. Inference About a Correlation Coefficient
  8. Correlation and Linear Association Height (inches) and Income ($/mo.) in first post-MBA Job (men). WSJ, 12/30/86. Ht. Inc. Ht. Inc. Ht. Inc. 70 2990 68 2910 75 3150 67 2870 66 2840 68 2860 69 2950 71 3180 69 2930 70 3140 68 3020 76 3210 65 2790 73 3220 71 3180 73 3230 73 3370 66 2670 64 2880 70 3180 69 3050 70 3140 71 3340 65 2750 69 3000 69 2970 67 2960 73 3170 73 3240 70 3050 Correlation = 0.845 t = .845 / sqr((1-.8452)/(30-2)) = 8.361
  9. Correlation is Not Causality Height (inches) and Income ($/mo.) in first post-MBA Job (men). WSJ, 12/30/86. Ht. Inc. Ht. Inc. Ht. Inc. 70 2990 68 2910 75 3150 67 2870 66 2840 68 2860 69 2950 71 3180 69 2930 70 3140 68 3020 76 3210 65 2790 73 3220 71 3180 73 3230 73 3370 66 2670 64 2880 70 3180 69 3050 70 3140 71 3340 65 2750 69 3000 69 2970 67 2960 73 3170 73 3240 70 3050 Correlation = 0.845
  10. Linear regression is about correlation Regression of salary vs. Regression of fuel bill vs. number years of experience of rooms for a sample of homes The variables are highly correlated because the regression does a good job of predicting changes in the y variable associated with changes in the x variable.
  11. Regression Algebra
  12. Variance Decomposition
  13. ANOVA Table
  14. Fit of the Model to the Data
  15. Explained Variation The proportion of variation “explained” by the regression is called R-squared (R2) It is also called the Coefficient of Determination (It is the square of something – to be shown later.)
  16. Movie Madness Fit R2
  17. Pretty Good Fit: R2 = .722 Regression of Fuel Bill on Number of Rooms
  18. Regression Fits R2 = 0.924 R2 = 0.522 R2 = 0.424 R2 = 0.880
  19. R2 is still positive even if the correlation is negative. R2 = 0.338
  20. R Squared Benchmarks Aggregate time series: expect .9+ Cross sections, .5 is good. Sometimes we do much better. Large survey data sets, .2 is not bad. R2 = 0.924 in this cross section.
  21. R-Squared is rxy2 R-squared is the square of the correlation between yi and the predicted yi which is a + bxi. The correlation between yi and (b0 +b1xi) is the same as the correlation between yi and xi. Therefore,…. A regression with a high R2 predicts yi well.
  22. Squared Correlations rxy2 = 0.522 rxy2 = .161 rxy2 = .924
  23. Regression Fits Regression of salary vs. Regression of fuel bill vs. number years of experience of rooms for a sample of homes
  24. Is R2 Large? Is there really a relationship between x and y? We cannot be 100% certain. We can be “statistically certain” (within limits) by examining R2. F is used for this purpose.
  25. The F Ratio
  26. Is R2 Large? Since F = (N-2)R2/(1 – R2), if R2 is “large,” then F will be large. For a model with one explanatory variable in it, the standard benchmark value for a ‘large’ F is 4.
  27. Movie Madness Fit R2 F
  28. Why Use F and not R2? When is R2 “large?” we have no benchmarks to decide. We have a table for F statistics to determine when F is statistically large: yes or no.
  29. F Table n2 is N-2 The “critical value” depends on the number of observations. If F is larger than the value in the table, conclude that there is a “statistically significant” relationship. There is a huge table on pages 826-833 of your text. Analysts now use computer programs, not tables like this, to find the critical values of F for their model/data.
  30. Internet Buzz Regression n2 is N-2 Regression Analysis: BoxOffice versus Buzz The regression equation is BoxOffice = - 14.4 + 72.7 Buzz Predictor Coef SE Coef T P Constant -14.360 5.546 -2.59 0.012 Buzz 72.72 10.94 6.65 0.000 S = 13.3863 R-Sq = 42.4% R-Sq(adj) = 41.4% Analysis of Variance Source DF SS MS F P Regression 1 7913.6 7913.6 44.16 0.000 Residual Error 60 10751.5 179.2 Total 61 18665.1
  31. Inference About a Correlation Coefficient This is F
  32. $135 Million Klimt, to Ronald Lauder http://www.nytimes.com/2006/06/19/arts/design/19klim.html?ex=1308369600&en=37eb32381038a749&ei=5088&partner=rssnyt&emc=rss
  33. $100 Million … sort of Stephen Wynn with a Prized Possession, 2007
  34. An Enduring Art Mystery Graphics show relative sizes of the two works. The Persistence of Econometrics. Greene, 2011 Why do larger paintings command higher prices? The Persistence of Memory. Salvador Dali, 1931
  35. Monet in Large and Small Sale prices of 328 signed Monet paintings The residuals do not show any obvious patterns that seem inconsistent with the assumptions of the model. Log of $price = a + b log surface area + e
  36. The Data Note: Using logs in this context. This is common when analyzing financial measurements (e.g., price) and when percentage changes are more interesting than unit changes. (E.g., what is the % premium when the painting is 10% larger?)
  37. Application: Monet Paintings Does the size of the painting really explain the sale prices of Monet’s paintings? Investigate: Compute the regression Hypothesis: The slope is actually zero. Rejection region: Slope estimates that are very far from zero. The hypothesis that β = 0 is rejected
  38. An Equivalent Test Is there a relationship? H0: No correlation Rejection region: Large R2. Test: F= Reject H0 if F > 4 Math result: F = t2. Degrees of Freedom for the F statistic are 1 and N-2
  39. Monet Regression: There seems to be a regression. Is there a theory?
More Related