1 / 19

Class 23

Class 23. The most over-rated statistic The four assumptions The most Important hypothesis test yet Using yes/no variables in regressions. Adjusted R-square. Pg 9-12 Pfeifer note.

locke
Download Presentation

Class 23

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Class 23 The most over-rated statistic The four assumptions The most Important hypothesis test yet Using yes/no variables in regressions

  2. Adjusted R-square Pg 9-12 Pfeifer note Our better method of forecasting hours would use a mean of 7.9 and standard deviation of 3.89 (and the t-distribution with 14 dof) The sample variance is The variation in Hours that regression will try to explain

  3. Adjusted R-square Pg 9-12 Pfeifer note Our better method of forecasting hours for job A would use a mean of 10.51 and standard deviation of 2.77 (and the t-distribution with 13 dof) The squared standard error is The variation in Hours regression leaves unexplained.

  4. Adjusted R-square Pg 9-12 Pfeifer note • Adjusted R-square is the percentage of variation explained • The initial variation is s2 = 15.1 • The variation left unexplained (after using MSF in a regression) is (standard error)2 = 7.69. • Adjusted R-square = • Adjusted R-square = (15.1-7.69)/15.1 = 0.49 • The regression using MSF explained 49% of the variation in hours. • The “adjusted” happened in the calculation of s and standard error.

  5. From the Pfeifer note Standard error = 0 Adj R-square = 1.0 Adj R-square = 0.5 Standard error = s Adj R-square = 0.0

  6. Why Pfeifer says R2 is over-rated • There is no standard for how large it should be. • In some situations an adjusted R2 of 0.05 would be FANTASTIC. In others, an adjusted R2 of 0.96 would be DISAPOINTING. • It has no real use. • Unlike “standard error” which is needed to make probability forecasts. • It is usually redundant • When comparing models, lower standard errors mean higher adjR2 • The correlation coefficient (which shares the same sign as b) ≈ the square root of adjR2.

  7. The Coal Pile Example 96% of the variation in W is explained by this regression. • The firm needed a way to estimate the weight of a coal pile (based on it’s dimensions) We just used MULTIPLE regression.

  8. The Coal Pile Example 100% of the variation in W is explained by this regression. • Engineer Bob calculated the Volume of each pile and used simple regression… Standard error went from to 20.6 to 2.8!!!

  9. Sec 5 of Pfeifer note Sec 12.4 of EMBS The Four Assumptions • Linearity • Independence • The n observations were sampled independently from the same population. • Homoskedasticity • All Y’s given X share a common σ. • Normality • The probability distribution of Y│X is normal. • Errors are normal. Y’s don’t have to be.

  10. Sec 5 of Pfeifer note Sec 12.4 of EMBS The four assumptions Our better method of forecasting hours for job A would use a mean of 10.51 and standard deviation of 2.77 (and the t-distribution with 13 dof) Linearity Independence (all 15 points count equally) homoskedasticity Normality

  11. P 13 of Pfeifer note Sec 12.5 of EMBS Hypotheses • H0: P=0.5 (LTT, wunderdog) • H0: Independence (supermarket job and response, treatment and heart attack, light and myopia, tosser and outcome) • H0: μ=100 (IQ) • H0: μM= μF (heights, weights, batting average) • H0: μcompact= μmid =μlarge (displacement)

  12. P 13 of Pfeifer note Sec 12.5 of EMBS H0: b=0 • b=0 means X and Y are independent • In this way it’s like the chi-squared independence test….for numerical variables. • b=0 means don’t use X to forecast Y • Don’t put X in the regression equation • b=0 means just use to forecast Y • b=0 means the “true” adj R-square is zero.

  13. P 13 of Pfeifer note Sec 12.5 of EMBS Testing b=0 is EASY!!! • H0: μ=100 • P-value from the t.dist with n-1 dof • H0: b=0 • (-0)/(se of coef) • P-value from t.dist using n-2 dof. The t-stat to test b=0. The 2-tailed p-value. The standard error of the coefficient

  14. Using Yes/No variable in Regression Numerical Categorical Numerical Categorical Does MPG “depend” on fuel type? n=60 Sec 8 of Pfeifer note Sec 13.7 of EMBS

  15. Fuel type (yes/no) and mpg (numerical) H0: μP = μR Or H0: μP – μR = 0 • Un-stack the data so there are two columns of MPG data. • Data Analysis, T-test two sample Sec 8 of Pfeifer note Sec 13.7 of EMBS

  16. Using Yes/No variables in Regression • Convert the categorical variable into a 1/0 DUMMY Variable. • Use an if statement to do this. • It won’t matter which is assigned 1, which is assigned 0. • It doesn’t even matter what 2 numbers you assign to the two categories (regression will adjust) • Regress MPG (numerical) on DUMMY (1/0 numerical) • Test H0: b=0 using the regression output. Sec 8 of Pfeifer note Sec 13.7 of EMBS

  17. Using Yes/No variables in Regression Sec 8 of Pfeifer note Sec 13.7 of EMBS

  18. Regression with one Dummy variable For Regular, 27.7 When D=0, H0: μP = μR Or H0: μP – μR = 0 Or H0: b = 0 For premium, 24.3 When D=1,

  19. What we learned today • We learned about “adjusted R square” • The most over-rated statistic of all time. • We learned the four assumptions required to use regression to make a probability forecast of Y│X. • And how to check each of them. • We learned how to test H0: b=0. • And why this is such an important test. • We learned how to use a yes/no variable in a regression. • Create a dummy variable.

More Related