1 / 53

Lecture 5 The Problem of Statistical Inference (Chapters 5 and 8)

Lecture 5 The Problem of Statistical Inference (Chapters 5 and 8). Hypothesis Testing in the Two-Variable Regression Model-- Continued T esting Hypotheses about a Regression Coefficient Test of Significance Approach  Analysis of Variance Approach .

mulan
Download Presentation

Lecture 5 The Problem of Statistical Inference (Chapters 5 and 8)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 5The Problem of Statistical Inference(Chapters 5 and 8) • Hypothesis Testing in the Two-Variable Regression Model-- Continued • Testing Hypotheses about a Regression Coefficient • Test of Significance Approach  • Analysis of Variance Approach 

  2. Hypothesis Testing in the Multiple Regression Model • Introduction • Testing Joint Hypotheses • Testing Significance of a Group of Coefficients • Testing Significance of the Overall Model • Testing for Causality • Testing Linear Restrictions on Coefficients • Testing Equality of Two Regression Coefficients • Testing Structural Stability of Regression Models

  3. Quick Review • Last time we saw that we can use the CNLR model and suppose to test a null hypothesis such as H0: ß2 = ß2*against say an alternative two-sided hypothesis such as H1: ß2 ß2* • We said one way to do this is to use the t-test of significance, where ^^ t = (ß2 - ß*2)/SE(ß2) ~ tn ‑2

  4. Quick Review • So, once we estimate the regression equation, we compute the above t ratio. • Next, we choose a level of significance, , and use it to look up the critical t value from the t table. • Finally, we use an appropriate decision rule to decide whether or not we should reject the null in favor of the alternative.

  5. Choosing the Level of Significance • How should we choose the level of significance? • There is no general rule to follow. • It is customary to use 1%, 5%, or 10%. • Sometimes the choice can be made based on the cost of committing type I error relative to that of committing a type II error. • You should choose a high level of significance if you suspect the test has a low power.

  6. The P-Value • Instead of using an arbitrary level of significance, nowadays we use the p-value, which is also known asthe exact level of significance or the marginal significance level • This is the lowest level of significance at which a given null hypothesis can be rejected • Note that for a given sample size, as |t| increases, the p-value decreases

  7. P-Value:Two Examples Variable Coefficient Std. Error t-Statistic Prob. C 0.01738 0.00287 6.052519 0.000 X 0.21637 0.18839 1.148471 0.258 Variable Coefficient Std. Error t-Statistic Prob. C -0.00020 7.16E-0 -2.866233 0.006 X 0.49379 0.00906 9.49734 0.000

  8. Testing Hypothesis in the Two-Variable Model: Analysis of Variance Approach • As we said earlier, there are three alternative approaches for testing a null hypothesis: • confidence interval approach • test of significance approach • analysis of variance approach • Having studied the test of significance approach, we now turn to the analysis of variance approach.

  9. Analysis of Variance Approach • Analysis of variance (ANOVA) means examining the various sums of squares in the relation,TSS = ESS + RSSin the context of regression analysis. • In this approach, the first step is to determine the degrees of freedom of the above sums of squares. • In the two-variable model these are as follows: • TSShasn - 1degrees of freedom • RSShasn - 2degrees of freedom • ESShas1degree of freedom

  10. Analysis of Variance Approach • Next, we define the mean sum of squares associated with a given sum of squares as the ratio of that sum of squares to its degrees of freedom: • Mean total sum of squares = TSS/(n-1) • ^ • Mean residual sum of squares = RSS/(n-2) = 2 • Mean explained sum of squares =ESS/1 = ESS • A table containing this information is called an ANOVA table.

  11. Analysis of Variance Approach • We use the information ina an ANOVA table to construct the following statistic, which is used for testing H0: ß2 = 0in the two-variable model: • ESSESS • F = = ^ • RSS/(n-2) 2 • In the two-variable CNLR model this statistic has an F distribution with 1 degree of freedom in the numerator and n-2 degrees in the denominator. • It can be used to test the statistical significance of the only slope coefficient in the bivariate model.

  12. Analysis of Variance Approach • ^ • Large values of F (i.e., large ESSrelative to 2)lead to the rejection of H0,while small values of F would be consistent with H0. • Of course, question remains as to how large is large and how small is small? • As with the t test, the answer is, relative to the critical value of the test (here F) statistic. • In fact, to apply this test, which is known as the F test, we follow the same procedure as with t test.

  13. Analysis of Variance Approach • First, using sample data, we compute theFratio. • Next, we choose a level of significance, and use the F table to find the critical F value with 1 and n-2degrees of freedom. • Finally, we use the usual decision rule for rejecting or not rejecting the null hypothesis, i.e., we reject the null if the calculated F exceeds the critical F, otherwise we don’t reject the null.

  14. Analysis of Variance Approach: An Example • Let’s use the U.S. consumption function we estimated earlier, where ß^2=0.76, ESS =4,598,500.9, and RSS = 6,107.3to test H0: ß2 = 0 against H1: ß2 0at the5% level. • Noting that this is a bivariate model (i.e., k = 2) we determine that ESS has k-1 = 1 degree of freedom, and RSS has 32 - 2 = 30 degrees of freedom so that the Fratio is, • F = 4,598,500.9/(6,107.3/30) = 22,588.55

  15. Analysis of Variance Approach: An Example • At the 5%level and with 1 and 30 degrees of freedom, the critical F value is 4.17. • Since the computedFis greater than the criticalF we reject the null in favor of the alternative. • Thus we conclude that at the 5%level our point estimate of ß2, i.e., 0.76 is statistically significantly different from zero.

  16. Analysis of Variance Approach: Some Remarks • In the two-variable model, this F test is only applicable to zero null hypothesis. But, as we will see later on, in multiple regression variants of the F statistic can be used for testing a large variety of null hypotheses involving several regression coefficients. • The F test is a two-tail tests.

  17. Analysis of Variance Approach: Some Remarks • In the two-variable model, regardless of whether we use the t or F test, the final decision (outcome) is the same. • This is because F1, n-2 = t2n-2

  18. Analysis of Variance Approach: Some Remarks • It can be shown that • F = (n-2)[R2/(1-R2)] • From this, it follows that F  0 as R2 0. • And as F  as R2 1. • You see, R2 and F move together • Thus we can use the F statistic to test the statistical significance of R2, that is test H0:R2=0 againstH1:R2 0

  19. Introduction • In multiple regression sometimes we concerned with the joint effect of explanatory variables, in addition to their partial or individual effects. • This means that in multiple regression, we can test not only hypotheses that involve a single regression coefficient, but also hypotheses that include several regression coefficients. • We begin with hypotheses that involve a single regression coefficient.

  20. Testing Hypothesis Involving a SinglePartial Regression Coefficient • As in the two-variable regression model, we can use either the t test or the F test. • However, the F test for testing hypotheses on a single regression coefficient is somewhat different in the multiple regression model relative to the two-variable model. ^ • In particular, F = ESS/2, which we used in the two-variable model to test the statistical significance of the only slope coefficient, 2, can no longer be used in the multiple regression to test the same hypothesis.

  21. Testing Hypothesis Involving a SinglePartial Regression Coefficient • In multiple regression, the procedure for performing an F test of statistical significance of a single regression coefficient is a special case of the general F testing procedure used to test a host of different hypotheses. • Let’s see how this is so by studying the general Ftesting procedure.

  22. The ANOVA Approach in the Multiple Regression Model • In the multiple regression model, the ANOVA approach, known as Wald test, involves the same set of steps regardless of what form the null hypothesis takes. • The idea is to once assume the null hypothesis is true, and another time assume the alternative is true and then determine which model, that corresponding to the null or the alternative hypothesis, fits the data better.

  23. Steps in Wald Test 1. Assume the null hypothesis is true, and find out what the model would look like in this case. Call this the restrictedmodel 2. Estimate the restrictedmodel and save the RSS. Denote this RSSr 3. This time assume the alternative hypothesis is true, in which case the original model, which we call the full or unrestricted model applies. Estimate this, obtain the RSS and call it RSSu

  24. Steps in Wald Test • 4. Construct the following statistic: • (RSSr - RSSu)/m • F = --------------------- • RSSu/(n-k) • Here k is the umber of parameters in the original (full or unrestricted) model including the intercept, and m is the difference in the number of coefficients in the full and restricted models. • Note that because RSSr > RSSu,the above F ratio is a nonnegative number • In the multiple CNLR model the above ratio has an F distribution with m and n-k degrees of freedom

  25. Steps in Wald Test • 5. Compute the above WaldF statistic and compare it with the critical F value at the chosen of level of significance. • The decision rule is as usual. • We can express the above F in terms of R2 from the unrestricted and restricted models: • (R2u - R2r)/m • F = ---------------- • (1-R2u)/(n - k)

  26. Applications of Wald Test • In using the Wald test, the main task is to find the restricted model. • Below I present the restricted model for testing a number of useful hypotheses in the context of the following quad-variate model, • Yt = ß1 + ß2X2t + ß3X3t + ß4X4t + ut • Note that this will be the unrestricted model regardless of the null hypothesis considered.

  27. Testing Statistical Significance of an individual Regression Coefficient • H0: ß2 = 0 vs. H1: ß2 0 • In this case the restricted model is as follows, • Yt = ß1 + ß3X3t + ß4X4t + ut

  28. Testing a Non-Zero Joint Hypothesis • H0: ß2=ß*2and ß3=ß*3vs.H1: ß2ß*2 or ß3ß*3 • Here ß*2 and ß*3 are hypothesized (known) values of ß2 and ß3, respectively, e.g., 0 and 1,... • In this case the restricted model is, • Yt = ß1 + ß*2X2t + ß*3X3t + ß4X4t + ut • or Yt - ß*2X2t - ß*3X3t = ß1 + ß4X4t + ut

  29. Testing Joint Significance of a Group of Coefficients • H0: ß2 = ß3 = 0 vs. H1: ß2ß3 0 • This is a special case of the previous test, where ß* is zero. • The restricted model is, • Yt = ß1 + ß4X4t + ut

  30. Granger Non-Causality Test • This is a useful application of the above test of significance of a group of coefficients. • I ask you to rely on your own notes and the text for this topic. Warning You are expected (polite for required) to study Section 17.14,Causality in Economics: The Granger Test, pp. 620-23 of Gujarati

  31. Testing the Overall Significance of the Model • H0: ß2 = ß3 = ß4 = 0 vs. H1: ß2 ß3 ß4 0 • This amounts to testing H0: R2 = 0 vs.H1: R2 0 • In this case, the restricted model is • Yt = ß1 + ut • ^ • If you estimate such a model, you’d find Yt = ß1 • In practice, we don’t estimate the above restricted model to test the overall significance of the model.

  32. Applications of Wald Test: Testing the Overall Significance of the Model • Instead, we use the F statistic we used for the same purpose in the two-variable model namely, • ESS/(k-1) • F = --------------- • RSS/(n-k)

  33. Testing Linear Restrictions • H0: ß2 + ß3 = c versus H1: ß2 + ß3 c • where c is a known constant, e.g, 0, 1, 1/2, etc. • Find the restricted model by solving the null hypothesis for one of the parameters as a function of the other, e.g., ß2 = c - ß3 • Substitute this in the original model, • Yt = ß1 + (c - ß3)X2t + ß3X3t + ß4X4t + ut • or Yt - cX2t = ß1 + ß3(X3t - X2t) + ß4X4t + ut

  34. Testing Linear Restrictions • Thus, in order to find the RSS associated with the restricted model, you should generate two variables, Yt - cX2t and X3t - X2t and regress the former on the latter, a constant, and X4. • The above procedure is known as Restricted Least Squares (RLS). • Note that the restriction under H0 is linear since it holds as an equality. • An example of a nonlinear restrictions would be ß2 + ß3 < c, which cannot be handled by F test.

  35. Testing Equality of two Regression Coefficients • H0: ß2 = ß3vs. H1: ß2 ß3 • The restricted model is • Yt = ß1 + ß2X2t + ß2X3t + ß4X4t + ut • =ß1 + ß2(X2t + X3t) + ß4X4t + ut

  36. Testing Stability of the Model • When we estimate a regression model, we assume implicitly that the regression coefficients are constant over time, that is, the model is stable. • However, regime changes can cause structural changes in the model. • Thus, it is important to test the the assumption of constancy or stability of the parameters of the regression model.

  37. Testing Stability of the Model • Let the model representing the period before the event in question (the first n1observations) be... • Yt = 1 + 2X2t + 3X3t + u1 t = 1, 2, …, n1 • Let the model representing the period following the change (the remaining n2 observations) be... • Yt = 1 + 2X2t + 3X3t + u2 t = 1, 2, …, n2 • The null hypothesis is NO structural change, i.e., the models representing the two sub-periods are one and the same, H0: 1 = 1, 2 = 2, 3 = 3

  38. Applications of Wald Test • If H0 turns out to be true (i.e., if it is not rejected), we can estimate a single regression over the entire period by pooling the two sub-samples (using the full sample of n = n1 + n2 observations). • The null hypothesis is tested as follows: • 1. Estimate the model using the first sub- sample of n1observations, and save the RSS . Call this RSS1. • 2. Estimate the model over the second sub- sample using n2observations, find the RSS, and call it RSS2.

  39. Applications of Wald Test 3. The unrestricted RSS, which assumes H1 is true (i.e. assumes there is a break in the regression line) equals RSS1 + RSS2. 4. Estimate the model using all of the available observations, that is the full sample of n = n1 + n2 observations. Obtain the RSS and denote it RSSr. This is the restricted RSS because estimating the model over the entire sample period is valid only if H0 is true, that is if there is no break in the model.

  40. Applications of Wald Test • 5. Construct the following ratio: • (RSSr - RSSu)/k • F = --------------------- • RSSu/(n - 2k) • This has an F distribution with k and n-2k degrees of freedom. • The decision rule is as usual. • The above test is known as Chow Breakpoint Test and is available in EViews.

  41. Other Applications of the t Test • As simple as it is, the t test has many applicat-ions, and when used properly has a high power. • So far, we have studied its use for testing zero and non-zero hypotheses on regression coefficients. • We see how it can be used for testing hypotheses involving more than one regression coefficients, which are typically tested using the F test. • We will also see how the t test can be used to test hypotheses on the simple correlation coefficient.

  42. Testing Linear Restrictions using the tTest • Consider the following trivariate model, • Yt = ß1 + ß2X2t + ß3X3t + ut • Suppose you ant to testH0: ß2 + ß3 = cversus H1: ß2 + ß3 c, where c is a known constant. • Rewrite the null hypothesis as, ß2 + ß3 - c = 0.

  43. Testing Linear Restrictions using the tTest • Construct the following t ratio, • ^ ^ ^ ^ ^ ^ • t =[ß2+ß3-c]/[(Var(ß2)+Var(ß3)+2Cov(ß2, ß3)] • This has a t distribution with n-2 degrees of freedom. • The decision rule is as usual.

  44. Testing Equality of two Regression Coefficients using the t Test • Consider the following trivariate model, • Yt = ß1 + ß2X2t + ß3X3t + ut • Suppose you ant to testH0: ß2 = ß3versus H1:ß2 ß3. • Write the null hypothesis as, ß2 - ß3 = 0.

  45. Testing Equality of two Regression Coefficients using the t Test • Construct the following t ratio, • ^ ^ ^ ^ ^ ^ • t =[ß2 - ß3]/[(Var(ß2)+Var(ß3) - 2Cov(ß2, ß3)] • This has a t distribution with n-2 degrees of freedom. • The decision rule is as usual.

  46. Testing Hypothesis on the Correlation Coefficient using the t Test • Recall the simple correlation coefficient between any two random variables is given by, r12 = S12/(S11S22) • In the CNLR model, t = r12/ SE(r12) ~ tn-2 follows the t distribution with df = n-2. • Here, SE(r12) = [(1 - r2)/(n - 2)]

  47. Testing Hypothesis on the Correlation Coefficient using the t Test • The above t statistic can be used to test a number of hypotheses about the correlation coefficient. • Some hypotheses of interest are: H0: r = 0versusH1: r < 0(one‑tailed) H0: r = 0versusH1: r > 0(one‑tailed) H0: r = 0versusH1: r  0(two‑tailed) • The decision rule is as with any ttest, both one-tailed and two-tailed.

  48. Practical Aspects of Hypothesis Testing Please study Section 5.8, pp. 129-134 of Gugarati

  49. Reporting Results of Regression Analysis • If there is only one equation, report it as follows: ^ Yi = 91.1 + 20.5Xi + u (1.75)* (2.67)** *Significant at the 10% level (two tail) **Significant at the 1% level (one tail) • Indicate whether the numbers in parentheses are estimated standard errors,tratios, orpvalues. • In the first two cases, the asterisks (*and**)would be needed, but not if you choose to report the pvalues, as long as you make it clear.

  50. Reporting Results of Regression Analysis • If the data are time-series, report the estimation period and frequency of data, e.g., 1969-1988for annual data, 1969.1-1988.4 for quarterly data, or 1969.01-1988.12 if the data are monthly. • It is also desirable to report the sample mean value of the dependent variable (and perhaps those of the independent variables).

More Related