520 likes | 584 Views
Chapter 13. Statistics in common use which are used to test the regression model. This chapter we will introduce some statistics in common use.
E N D
Chapter 13 Statistics in common use which are used to test the regression model
This chapter we will introduce some statistics in common use. We should make Heteroscedasticity and autocorrelation test towards the random error term first. Other tests should base on that the random error term is satisfied with the hypothesis. In the process of testing the restriction of the regression coefficients, we should delete unimportant variables step by step, simultaneously keeping the heteroscedasticity and autocorrelation of the random error term. Firstly, we will summarize the t and F statistics briefly. Then we will introduce ten kinds ofstatistics which includes F statistics and Likelihood Ratio (LR) test that are used to test the liner restriction of regression coefficients, Wald test and Lagrange multiplier test that are used to examine the liner restriction and unrestriction of the regression coefficients, Jarque–Bera (JB) test which is used to test the normal distribution, Akaike’s information criterion (AIC) and Schwarz’s information criterion (SC) and Hannan-Quinn (HQ) that are used to determine the best lag length, the Granger Causality test which is used to test the causality or the direction of influence between variables, the Chow test which is used toexamine the structural stability of a regression model, and so on.
13.1 Testing the overall significance of the sample regression—the F test Taking the multiple regression for example: Yt= β0 + β1Xt1+ β2Xt2+ ….+βk-1Xt k-1+ut ( 13-1) The null hypothesis is:β1=β2=…..=βk-1=0 The alternative hypothesis is:β1、β2、…..βk-1 are not simultaneously equal to zero. If the H0 is right,then F = ~ F(k,T-k) ( 13-2) (RSS is the residual sum of squares; ESS is the explained sum of squares) If F > Fα(k− 1, n− k), reject H0; otherwise you do not reject it. where Fα(k− 1, n− k) is the critical F value at the α level of significance and (k− 1) numerator df and (n− k) denominator df.
13.2 Testing the individual regression coefficients—the T test Taking the multiple regression for example: Yt= β0 + β1Xt1+ β2Xt2+ ….+βk-1Xt-1+ut If the conclusion of the F test is not rejecting the null hypothesis, we should stop the test.Otherwise we should take the t test futher.The null hypothesis is: β j=0 The alternative hypothesis is : β j 0, (j=1、2…..k-1) If the H0 is right,then , (j=1、2….k-1) (13-3) is the estimation of is the sample standard deviation of If , reject H0; otherwise you do not reject it. is the level of significance
13.3 The F-Test Approach: Restricted Least Squares The F test provides a general method of testing hypotheses about one or more parameters such as H0: β2 = β3 or such as H0: β2 = β3. Given the (k-1)-variable regression model: Yt= β0 + β1Xt1+ β2Xt2+ ….+βk-1Xt k-1+ut (13-4) To test the hypothesis H0:βk-m+1=···=βk-1= 0 Versus H1:βk-m+1···βk-1 are not simultaneously equal to zero If the H0 is right,the regression model is: Yt= β0 + β1Xt1+ β2Xt2+ ….+βk-mXt k-m+ut (13-5) (13-4) is the unrestricted regression;(13-5) is known as restricted least squares (RLS).
13.3 The F-Test Approach: Restricted Least Squares If the H0 is right,then (13-6) follows the F distribution with m, (n − k) df. is RSS of the unrestricted regression (13-4). is RSS of the restricted regression(13-5). m= number of linear restrictions k = number of parameters in the unrestricted regression n = number of observations If F Fα(m, T− k), reject H0; otherwise you do not reject it. is the level of significance. Note: F test is only used to test linear equality restrictions. When we expand testing that the part regression coefficients (m) are equal to zero to testing the all regression coefficients (k-1) are equal to zero, the F-statistics of (13-6) is actually the one of (13-2).
13.3 The F-Test Approach: Restricted Least Squares Now ,introducing another situation. For instance, Given the regression model: Yt= β0 + β1Xt1+ β2Xt2+ut (13-7) To test the hypothesis H0:β1+β2= 1 Versus H1:β1+β2= 1 If the H0 is right,the regression model is: Yt= β0 + β1Xt1+ (1-β1 )Xt2+ut or Yt -Xt2= β0 + β1 (Xt1-Xt2)+ut (13-8) (13-7) is the unrestricted regression (13-8) is the restricted regression The RSS of (13-7) is The RSS of (13-7) is
Example 13-1 The importance of explained variables in the model of Chinese issued government loan Firstly, we should analyze the characteristic of the array of Chinese government loan’s issue (the resulting graph appears in Figure 13-1). Independent variables are GDPt DEFt REPAYt; Dependent variable is DEBTt. Date appears in Table 13-1.Establishing the regression model according to scatter diagram: (DEBTt stands for the gross issued number of Chinese government loan; GDPt stands for the gross domestic product; DEFt stands for financial deficits ;REPAYt stands for debt and interest ) Figure 13-1the array of Chinese government loan’s issue (1980-2001)
The OLS regression results is: • The relevant coefficient between DEBTt and GDPt is 0.9678 which is maximum among the relevant matrix of coefficients of the four variables above. Do we should delete DEFt and REPAYt from the regression equation? • We will use F-test introduced in this chapter to answer the question above. H0: β2=β3=0 The OLS regression results of the restricted regression is : We know that m=2 (m=number of linear restrictions ),T-k=18. RSSu=48460.78,RSSr=2942679 according to (13-9)、(13-10). According to (13-6),we can compute that Because F=537.5>F0.05(2,18)=3.55,we rejectH0. Conclusion: we can ‘t delete DEFt and REPAYt from the regression equation.
Using Eviews, we have three methods to complete the test above. • (1) Clicking View/Coefficient Tests/Wald Coefficient Restrictions on the output of (13-9).Typing c(3)=c(4)=0 in the following dialog box (the number of regression coefficient froms c(1) in Eviews ). • Figure 13-3 is the result: Figure 13-3 the result of F-test (β2=β3=0) in Eviews
(2)Clicking View/ Coefficient Tests/ Redundant Variables-Likelihood Ratio (whether there are unimportant redundant independent variables in the model) on the output of (13-9).Typing DEF 、REPAY in the following dialog box. • Figure 13-4 is the result: Figure 13-4 the result of F-test (β2=β3=0) in Eviews
(3) Clicking View/ Coefficient Tests/ Omitted Variables-Likelihood Ratio (whether there are missing independent variables in the model) on the output of (13-9).Typing DEF 、REPAY in the following dialog box. • Figure 13-5 is the result: Figure 13-5 the result of F-test (β2=β3=0) in Eviews
13.4 Likelihood Ratio (LR) Test Likelihood Ratio Test is based on the principle that if the restriction is right, then the maximum value of log likelihood function of the unrestricted regression and the restricted regression are approximate equal. is the maximum value of log likelihood function of the unrestricted regression; 、 stand for the maximum likelihood estimation of the 、 ,respectively. is the maximum value of log likelihood function of the restricted regression; 、 stand for the maximum likelihood estimation of the 、 ,respectively.
13.4 Likelihood Ratio (LR) Test By definition, If the null hypothesis (the restriction is right) is right, then LR~ , m= number of linear restrictions If LR> ,reject H0(the restriction is right); otherwise you do not reject it. is the level of significance
13.4 Likelihood Ratio (LR) Test • Example 13-2 Considering example 13-1 again, now we use LR test to test hypothesis H0: β2=β3=0 . The OLS regression results of the unrestricted regression is: The OLS regression results of the restricted regression is: According to (13-11),compute Because LR=90.34> =5.99, we reject H0 . Conclusion: we can ‘t delete DEFt and REPAYt from the regression equation. This conclusion is the same as the one in the F-test .
13.4 Likelihood Ratio (LR) Test Using Eviews, we have two methods to complete the LR test. (1) Clicking View/ Coefficient Tests/ Redundant Variables-Likelihood Ratio (whether there are unimportant redundant independent variables in the model) on the output of (13-12).Typing DEF 、REPAY in the following dialog box. The result is Figure 13-4. In this result, LR=90.34,which is the same as the result of computation. (2) Clicking View/ Coefficient Tests/ Omitted Variables-Likelihood Ratio (whether there are missing independent variables in the model) on the output of (13-13).Typing DEF 、REPAY in the following dialog box. The result is Figure 13-5. In this result, LR=90.34,which is the same as the result of computation.
13.5 Wald Test • The Wald test is only required to estimate the unrestricted regression model. When it’s difficult to estimate the restricted regression model, this method is of great useful. Given the unrestricted regression model: Yt= β1X1t+ β2X2t+β3X3t+vt (13-4) Now we want to test the liner restrictionβ2=β3. Then the restricted regression model is: Yt= β1X1t+ β2(X2t+X3t)+vt or Yt= β1X1t+ β3(X2t+X3t)+vt Because , the Wald test is only required to estimate the unrestricted regression model(13-4). ( 、 are the estimation of regression coefficients in restricted regression.) Define W statistics as:
Wald statistic The E-Views operation methods for Wald’s test are as (13-20): Click View in the estimate results window, select Coefficient Tests, Wald-Coefficient Restrictions, and then fill the dialog box with C(2)/C(3)=0.5, we can get output as 13-6. Chi-square=0.065787 is the result of Wald statistic. chart 13-6 Eviews output of Wald Test
13.6 Lagrange multiplier(LM) statistic Befor LM test test only needs to estimate the constrained model. LM test can not only test the original assumptions of linear constraints but also test that of nonlinear constraints . Calculation steps of Auxiliary regression formula for LM test: (1) Determine the dependent variable in the LM auxiliary regression formula . Estimate constrained model by OLS,Calculate residual series , and made as dependent variable of LM auxiliary regression formula .
(2) Determine Explanatory variables in the LM auxiliary regression formula With the partial derivative method. Such as theNon-binding model, Rewrite the above equation in the form below: Then Explanatory variables in the LM auxiliary regression formula can be Determined by the following form: For theNon-binding model(13-26), Explanatory variables in the LM auxiliary regression formula are . The first explanatory variable 1 means Constant term is included in the LM auxiliary regression formula.
(3)Set up LM auxiliary regression formula as obtained by the first step. (4)Using OLS to estimate the above formula, and calculate the coefficient of determination. (5)Under the result obtained in the fourth step, calculate the value of LM statistic by using the following formula : T denote the sample size of the corresponding linear regression model. The value of LM calculated above is equal to The value of LM defined by (13-23). In the original assumptions, asymptotic -distribution of m degrees of freedom. m is the number of constraints. Discriminant rule is
IF calculated with a large sample, Null hypothesis is accepted, constraints true. IF calculated with a large sample, Null hypothesis is Refused, constraint not true. ɑ is test level. • Eg.13-4 As Sample 13-3 still, Production function for manufacturing industry in Taiwan Use LM statistic to test the coefficient . (1) Estimate Constraint model with OLS, and calculate residual series u, and made as dependent variable of LM auxiliary regression formula .
(2) Determine Explanatory variables in the LM auxiliary regression formula . For example, in the following non-binding model, Rewrite the above equation in the form below: Then Explanatory variables in the LM auxiliary regression formula can be Determined by the following form: For theNon-binding model(13-29), Explanatory variables in the LM auxiliary regression formula are . The first explanatory variable 1 means Constant term is included in the LM auxiliary regression formula.
(3)Set up LM auxiliary regression formula as obtained by the first step. (4)Using OLS to estimate the above formula, and calculate the coefficient of determination. (5)Under the result obtained in the fourth step, calculate the value of LM statistic by using the following formula : Conclusion is that the original hypothesis not substantiated .
These three statistics are the following relations: To the conditions that Log-likelihood function contains only one regression parameter , formulas of statistics LM, LR and W can be expressed by figure13-7. is unconstrained statistics, is constraints statistics. LR statistic is the log likelihood function of longitudinal distance , W statistic is the estimate parameters of the horizontal distance ,and LM statistic is the slope of the likelihood function on the points. Because these three statistics are asymptotically subject to distribution, when the sample is small and linear constraints, F test is more reliable than the above three tests.
Chart13-7 Graphic interpretation of the formula of LR, W andLM statictics
13.7 Akaike information criterion(AIC), Schwarz criterion(SC), And Hannan - Quinn criteria(HQ) to determine the maximum lag of dynamic hysteresis model, In addition to the F statistic have learned, you can use Akaike information criterion(AIC), Schwarz criterion(SC), And Hannan - Quinn criteria(HQ). AIC is made by the Japanese statisticians Akaike Hong Ci in 1973. Defined as follows Is the maximum log likelihood, T is the sample size, vector form of sum squared resid. K is the maximum lags of Model variables.
the right side of the first type increases as k becomes smaller .the second increases as k becomes larger . As change of k, AIC has a minimum value. The method of AIC is to determine the optimal lag, Take the smallest value of AIC by continuous increase in the variable lag. 13.7 Akaike information criterion(AIC), Schwarz criterion(SC),And Hannan - Quinn criteria(HQ) SC, also called Bayesian Information Criterion(BIC), defined as follows: The definition of logL, T, k is the same. Similar with AIC, SC has a minimum value as the changes of K.
The use of SC is similar with AIC. Formula by Eviews: , the same as the previous definition. Hannan - Quinn Standards: The definition of logL, T, k is the same with (13-33). Warn: AIC、SC(BIC) and HQ are not always the best statistics in comparison of model specification. But these standards can be used to determine the maximum lag k of ADL model.
13.7 Akaike information criterion(AIC), Schwarz criterion(SC),And Hannan - Quinn criteria(HQ) Eg. 13-6 Chart 13-8 is the SZt series from 4.1/1999 to 10-15/2001. Then determine the best lag of AR by AIC, SC and HQ. Firstly, use SZ to make a regression on constant term, and estimated 1-4 autoregressive.
Calculation of AIC, SC and HQ of each estimator according to the result(chart13-3). For all of the minimum value of AIC, SC and HQ are K=3, the most reasonable model is 3-log autoregressive model concluded from these three statistics. graph 13-8 SZt series (4.1/1999 to 10-15/2001)
chart13-3 the values of AIC, SC and HQ Lag length 0-order autoregressive model 1-order autoregressive model 2-order autoregressive model 3-order autoregressive model 4-order autoregressive model Note: the value of AIC and SC(BIC) can be found in Eviews output graph 13-9 : y-axis is the value of AIC,SC (BIC), HQ and x-axis is lag length (line of AIC and SC approximately coincide)
13.8 Testing JB statistic of normal distribution We can use JB statistic to test model errors if obey normal distribution ,when diagnostic process model. In fact, JB statistic can test any random distribution, to see if it obey normal distribution. Look back the definition of skewness and kurtosis. For time series Kurtosis defined as, is observed value. means standard deviation of . is the sample average value, T is sample capacity. We can see from the formula, if the distribution is symmetric on , skewness value is 0. so if obey normal distribution, skewness value is 0; if right skewed, skewness value S>0; if left skewed, skewness value S<0.
The measurement of physical production line product general obey normal distribution. The length of human life general obey left side distribution. China's household income obey the right side distribution at present. The define of peak K: is observed value. is the sample average value, s means standard deviation of the sample . T is sample capacity.Peak also called kurtosis. We can prove that kurtosis of normal distribution is 3. If both sides of a distribution Thicker than that of normal distribution, the kurtosis of this distribution K>3, otherwise, K<3.
13.8 Testing JB statistic of normal distribution The JB statistic of test normal distribution defined by Skewness and kurtosis is as follows (13-38) Where the number of observed values, is Skewness, is Kurtosis. If the test objects are time series of observations obtained directly, takes n=0.If test objects are residual series of regression model, n equal to the number of explanatory variables of the regression model. Under the condition Null hypothesis that random variables obey normal distribution, JB statistic obey . Criterion: If calculated from the sample, accept the Null hypothesis, this distribution follows a normal distribution; If calculated from the sample, refused the Null hypothesis, this distribution not follows a normal distribution;
13.8 Testing JB statistic of normal distribution When we does JB normal distribution test, we need click on View button of data set window in Eviews, select Descriptive Statistics/Histogram and Stats function, or click on Quick button, and select Series Statistics/Histogram and Stats function. In this way, we can obtain the above results. Chart 13-11 the Eviews output of JB Normality test
Example 13-8 Picture 13-11 shows results containing 1000 numerical sequences. Because of , JB value is in the null hypothesis received field. We can also descript it by p value, , then JB value is in the null hypothesis received field. So wecan conclude that the distribution is normal distribution.