200 likes | 357 Views
Specification Error I. Aims and Learning Objectives. By the end of this session students should be able to: Understand the causes and consequences of specification error Analyse regression results for possible specification error Undertake appropriate remedies for specification
E N D
Aims and Learning Objectives • By the end of this session students should be able to: • Understand the causes and consequences of • specification error • Analyse regression results for possible • specification error • Undertake appropriate remedies for specification • error
Introduction • Before any equation can be estimated, it must be completely specified. Broadly speaking, specifying an econometric equation consists of the following: • Choosing the “correct” explanatory variables • Choosing the “correct” functional form • Choosing the “correct” form of the error term Specification error can arise in a number of ways: (i) Omission of a relevant explanatory variable (ii) Inclusion of an irrelevant explanatory variable (iii) Adopting the wrong functional form (iv) Endogenous explanatory variable(s)
Recap: Assumptions of the Multivariate Regression Model A1. Yi = 1 + 2X2i + 3X3i + …+ kXki + Ui A2. E(Ui) = 0 A3. Cov(Ui, X2i) = Cov(Ui, X3i)... =... Cov(Ui,Xki) = 0 A4. Var(Ui) = 2 A5. Cov(Ui,Uj) = 0 A6. A7. No exact collinearity or perfect multicollinearity among the explanatory variables
Specification Error: Omission of a Relevant Explanatory Variable “True Model” Under-fitted Model X3 omitted from the under-fitted model In general, E(b2) 2
WHY? It can be shown that Where b32 is derived from If we estimate the “true” model then b2 measures the net effect of X2 on Y (since the influence of X3 is included in the model). When we omit a relevant variable (X3), b2 includes the net effect and the impact of the omitted variable (X3) on Y (we call this the indirect effect). E(b2) is biased. If there is no relationship between X2 and X3 then b32 is zero and there is no bias
Example Possible Omitted Variable Bias in a Wage Equation Dependent Variable: Wage or Earnings Explanatory Variable: Education
Var (b2) will be smaller (in general) Variance of 2 in the true model is Variance of b2 in the under-fitted model is
Recap: 1. If the left-out variable is correlated with the included variable the coefficient attached to the included variable is biased. 2. The variance of included variable is generally smaller In addition: 3. The coefficient is also inconsistent - the bias does not disappear as the sample size gets bigger 4. Confidence intervals and hypothesis tests may be misleading
Specification Error: Inclusion of an Irrelevant Explanatory Variable “True Model” Over-fitted Model X3i included in the over-fitted model E(b2) and var (b2) are still unbiased. However, estimates are now inefficient (variances are generally larger)
Var (b2) is inefficient Variance of 2 in the true model is Variance of b2 in the over-fitted model is As a result, confidence intervals will be wider and we run the risk of not rejecting a false null hypothesis
Does this suggest, therefore, that it is better to include irrelevant variables than to exclude relevant ones? • No, because as well as a loss of efficiency of the • estimators including irrelevant variables will • also result in: • Loss of degrees of freedom • And may result in: • Problems of multicollinearity • (more on this in lecture 10)
Functional Form Mis-specification Adopting an incorrect functional form For example, if we estimate a linear model But the true model is a log-linear model Then the mis-specification arises because we estimate the “wrong” functional form
Mis-specification Tests • Mis-specification generally occurs when: • We omit a relevant variable, or • We include an irrelevant variable, or • We use an incorrect functional form In most circumstances we do not know what the “true” model is. How can we determine, therefore, whether the model we estimate is correctly specified?
Mis-specification Tests Preliminary Analysis (informal Tests) • Variables based on economic theory (if possible) • Observe sign and significance of coefficients; what • happens when an additional variable is added or • deleted? • Does adj R2 increase when more variables are added • Look at the pattern of the residuals • (if there are noticeable patterns then it is possible • that the model has been mis-specified)
Ramsey’s RESET Test A more formal test of mis-specification Proxy variables RESET test: proxies based on the predicted value of Y
Example Suppose we estimate the following model and want to test for mis-specification. The RESET test uses the predicted values And creates various powers of Adding these powers to the original model, we then estimate a new model:
Example Perform an F-test on the significance of the additional variables If additional variables are significant: evidence of mis-specification Cautionary Note RESET is easy to apply but cannot tell us the reason for the mis-specification (i.e. omitted variable or functional form)
Summary In this lecture we have: 1. Started to look at regression models which violate the CLRM assumptions 2. Outlined the theoretical and practical consequences of under-fitting and over-fitting regression models and choosing an incorrect functional form 3. Outlined a number of procedures for detecting possible specification error