1 / 46

Econometrics Econ. 504

Econometrics Econ. 504. Chapter 7: The Normality Assumption and Inference with OLS. I. Role of Normality Assumption. Different Distributions. It is assumed that the unobserved factors are normally distributed around the population regression function.

lonnieblock
Download Presentation

Econometrics Econ. 504

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EconometricsEcon. 504 Chapter 7: The Normality Assumption and Inference with OLS

  2. I. Role of Normality Assumption

  3. Different Distributions

  4. It is assumed that the unobserved factors are normally distributed around the population regression function. • The form and the variance of the distribution does not depend on any of the explanatory variables.

  5. Since this assumption of normality is so crucial, then we have to add it to the CLRM assumptions. • This assumption is stronger than any other previous assumptions as it already contains CLRM assumptions by default (zero conditional mean of u , & homoskedasticity).

  6. Remember, normality assumption is not required to perform OLS estimation, but it is necessary only when you need to produce confidence intervals and/or perform hypothesis tests with OLS estimates.

  7. II. Estimations under the Normality Assumption • The error term contains the influence of many different forces (random variables) that affect DV (Y) and are not captured by IV (Xs). • The assumption of normality in (u) indicates that the sum of random variables is normally distributed as long as many random variables are present and the influence of any one random variable is small.

  8. In some applications, the assumption of normality for the error term is difficult to be justified. • For example, Y has limited or skewed values (i.e wage, prices), then you can use log values to obtain a distribution that’s approximately normal. • Under normality, OLS is the best unbiased estimator. Also, for the purposes of statistical inference, the assumption of normality can be replaced by a large sample size

  9. III. OLS Standard Errors & the t-distribution

  10. Therefore, the appropriate probability distribution becomes “t “ instead of standard normal: • Note the t-distribution is close to the standard normal distribution if n-k-1 is large.

  11. Testing the Significance of Individual Regression Coefficient: • Once you estimate a regression and have your OLS estimates, you need to know what conclusion can be obtained from your results. • You should know what the selected variables suggest about your hypothesized relationship? • What is the probability that results like the ones you produced were the result of chance? • To address these issues, you need to test the individual significance of your regression coefficients.

  12. A regression coefficient is statistically significant ( meaning the results did not happen just by chance) if you can provide solid evidence that the true parameter value isn’t zero. • In order to provide strong evidence that the true parameter value isn’t zero, you need to show that it is highly unlikely that the (X) variable associated with that coefficient has no effect on your dependent (Y) variable.

  13. The statistical significance of the coefficient does not determine the importance of the variable and the magnitude of its effect. • Keep in mind that statistical significance provides only evidence of a positive or negative effect ( in case of one-sided test). • For magnitude and importance, you need to focus on the value of the coefficient.

  14. IV. Testing of Significance Approach • You can report the statistical significance of your coefficients ( result of your hypothesis test) with either the test of significance approach or the confidence interval approach. • The test of significance approach gives you a test statistic that is used to determine the liklihood of your hypothesis. • The confidence interval approachprovides a range of possible values for your estimator.

  15. First: Confidence Interval Approach • Provides a range ( lower and upper limit) of values that would contain the true value (parameter) . • If you’re testing a hypothesis, the values of your estimated interval relative to the assumed value of the parameter determine whether you reject the null hypothesis or do not reject the null hypothesis.

  16. Critical region αl 2 Confidence interval 1-α Critical region αl 2 Lower limit Upper limit • If the hypothesized value of your parameter of interest is in the critical region, you fail to reject the null hypothesis. If it is in the confidence interval, you reject the null hypothesis.

  17. Testing against one-sided alternatives (greater than zero) • Reject the null hypothesis in favour of the alternative hypothesis if the estimated coefficient is too large (i.e. larger than a critical value). • Construct the critical value so that, if the null hypothesis is true, it is rejected in, for example, 5% of the cases. • In the given example, this is the point of the t-distribution with 28 degrees of freedom that is exceeded in 5% of the cases. • Conclusion: Reject if t-statistic greater than 1.701 .

  18. Critical values for the 5% and the 1% significance level (these are conventional significance levels). • The null hypothesis is rejected because the t-statistic exceeds the critical value. The effect of experience on hourly wage is statistically greater than zero at the 5% (and even at the 1%) significance level.“

  19. Testing against one-sided alternatives (less than zero) • Reject the null hypothesis in favour of the alternative hypothesis if the estimated coefficient is too small (i.e. Smaller than a critical value). • Construct the critical value so that, if the null hypothesis is true, it is rejected in, for example, 5% of the cases. • In the given example, this is the point of the t-distribution with 18 degrees of freedom so that 5% of the cases are below the point. • Conclusion: Reject if t-statistic less than -1.734. .

  20. Example: Math test = 2.𝟐𝟕𝟒+ 0.00046 Tech-W +𝟎.𝟎𝟒𝟖 𝒔𝒕𝒂𝒇𝒇-𝟎.𝟎𝟎𝟎𝟐 𝒆𝒏𝒓oll (6.113) (.00010) (.040) (.00022) t-Statistic = Degrees of freedom = • Critical values for the 5% and the 15% significance level. • The null hypothesis is not rejected because the t-statistic is not smaller than the critical value. One cannot reject the hypothesis that there is no effect of school size on student performance (not even for a lax significance level of 15%).

  21. Testing against two-sided alternatives • Reject the null hypothesis in favour of the alternative hypothesis if the absolute value of the estimated coefficient is too large. • Construct the critical value so that, if the null hypothesis is true, it is rejected in, for example, 5% of the cases. • In the given example, these are the points of the t-distribution so that 5% of the cases lie in the two tails. • Conclusion: Reject if absolute value of t-statistic is less than -2.06 or greater than 2.06. .

  22. Example: Coll_GPA = 1.𝟑𝟗+ 0.421 hsGPA +𝟎.𝟎𝟏𝟓 𝑨𝑪𝑻- 𝟎.𝟎𝟖𝟑 𝒔𝒌𝒊𝒑𝒑𝒆𝒅 (0.33) (.094) (.011) (.026) t-Statistic : • The effects of hsGPA and skipped are significantly different from zero at the 1% significance level. • The effect of ACT is not significantly different from zero, not even at the 10% significance level.

  23. General Conclusion For Testing: • If a regression coefficient is different from zero in a two-sided test, the corresponding variable is said to be ”statistically significant” • If the number of degrees of freedom is large enough ( greater than 120) so that the normal approximation applies, the following rules of thumb apply: statistically significant at 10 % level statistically significant at 5% level statistically significant at 1 % level

  24. If a variable is statistically significant, discuss the magnitude of the coefficient to get an idea of its economic or practical importance. • The fact that a coefficient is statistically significant does not necessarily mean it is economically or practically significant.. HOW? • If a variable is statistically and economically important but has the „wrong“ sign, the regression model might be misspecified.

  25. If a variable is statistically insignificant at the usual levels (10%, 5%, 1%), one may think of dropping it from the regression. • If the sample size is small, effects might be imprecisely estimated so that the case for dropping insignificant variables is less strong. • If on the basis of a test of significance in accepting H0, do not say we accept H0. It is preferable to say “do not reject” rather than “accept.”

  26. Computing p-values for t-tests • If the significance level is made smaller and smaller, there will be a point where the null hypothesis cannot be rejected anymore. • P-value is the smallest level of significance at which the null hypothesis can be rejected. • The reason is that, by lowering the significance level, one wants to avoid more and more to make the error of rejecting a correct H0.

  27. The smallest significance level at which the null hypothesis is still rejected, is called the p-value of the hypothesis test. • A small p-value is evidence against the null hypothesis because one would reject the null hypothesis even at small significance levels. • A large p-value is evidence in favor of the null hypothesis’ • P-values are more informative than tests at fixed significance levels.

  28. V. Testing multiple linear restrictions: The F test Model 1- Joint Significance Test Testing exclusion restrictions Salary of major league base ball player Years in the league Averagenumber of games per year Battingaverage Home runs per year Runs batted in per year against Test whether performance measures have no effect/can be excluded from regression.

  29. Estimation of the unrestricted model None ofthesevariabelsisstatisticallysignificantwhentestedindividually Idea:Howwouldthe model fit beifthese variables weredroppedfromtheregression?

  30. Estimation of the restricted model The sumofsquaredresidualsnecessarilyincreases, but istheincreasestatisticallysignificant? Numberofrestrictions The relative increaseofthesumofsquaredresidualswhengoingfrom H1to H0follows a F-distribution (if the null hypothesis H0iscorrect)

  31. Rejection rule The F-distributed variable only takes on positive values. Thiscorrespondstothefactthatthesumofsquaredresidualscanonlyincreaseifonemovesfrom H1to H0. Choosethecriticalvalue so thatthe null hypo-thesisisrejected in, forexample, 5% ofthecases, althoughitistrue.

  32. Test decision in example • Discussion • The three variables are “jointly significant“ • They were not significant when tested individually • The likely reason is multicollinearity between them Numberofrestrictionstobetested Degreesoffreedom in theunrestricted model The null hypothesisisoverwhel-minglyrejected (evenatverysmallsignificancelevels).

  33. 2- Overall Significance Test • Test of overall significance of a regression The test of overall significance is reported in most regression packages; the null hypothesis is usually overwhelmingly rejected The null hypothesisstatesthattheexplanatory variables are not usefulat all in explainingthedependent variable Restricted model (regression on constant)

More Related