190 likes | 429 Views
Problems with Incorrect Functional Form. You cannot compare R 2 between two different functional forms. Why? TSS will be different. One should also remember that an incorrect functional form may work within sample but have large forecast errors outside of sample. Linear Functional Form.
E N D
Problems with Incorrect Functional Form • You cannot compare R2 between two different functional forms. • Why? TSS will be different. • One should also remember that an incorrect functional form may work within sample but have large forecast errors outside of sample.
Linear Functional Form • Y = β0 + β1 X1 + β2 X2 + ε • Slope = β1 • Impact of X1 on Y is independent of the quantity of X2. • Elasticity = β1 * [X1/ Y]
Double-Log Functional Form • What if you wished to estimate the following model? • Y = β0 X1β1 X2β2 • To make this linear in the parameters • InY = β0 + β1 InX1 + β2 InX2 + ε • Slope = β1 = ΔlnY / ΔlnX1 = [ΔY / Y] / [ΔX1 / X1] • What is this? The elasticity, which is constant across the sample.
What is the slope in a double-functional form? • Slope = β1 * (Y/X) = [ΔY / Y] / [ΔX1 / X1] * (Y/X) = ΔY / ΔX • Impact of X1 on Y depends upon the quantity of X2 • In other words, the slope of X1 varies across the sample. • Why would this be a realistic property?
Other Functional Form • Semi-log functional form • Polynomial Form • Inverse Form • Know the equation and meaning of β1 for each of these forms. • More specifically, know the calculation of slope and elasticity for each functional form.
Problems with Incorrect Functional Form • You cannot compare R2 between two different functional forms. • Why? TSS will be different. • An incorrect functional form may work within sample but have large forecast errors outside of sample. • Violation of Classical Assumption I: The regression model is linear in the coefficients, is correctly specified, and has an additive error term.
Testing for Functional Form • The Quasi-R2 • Box-Cox Test • The MacKinnon, White, Davidson Test (MWD)
Quasi R2 • Estimate a logged model and create a set of LnY^ (predicted logged dependent variable). • Transform LnY^ by taking the anti-log. In Excel (@exp) is the function needed. • Calculate a new RSS with the results of step 2. • Calculate the quasi-R2 with the results of step 3.
The Box Cox Test • Calculate the geometric mean of the dependent variable in the model. • This can easily be calculated in Excel • Create a new dependent variable equal to Yi / Geometric Mean of Y • Re-estimate both forms of the model, with your new dependent variable. Compare the Residual Sum of Squares. Lowest value is the preferred functional form.
MWD Test • Estimate the linear model an obtain the predicted Y values (call this Yf^). • Estimate the double-logged model an obtain the predicted lnY values (call this lnf^). • Create Z1 = ln(Yf^) – lnf^ • Regress Y on X’s and Z1. Reject Ho (Y is a linear function of independent variables) if Z1 is statistically significant by the usual t-test. • Create Z2 = antilog of lnf^ - Yf^ • Regress log of Y on log of X’s and Z2. Reject HA (double-logged model is best) if Z2 is statistically significant by the usual t-tests.
Intercept Dummies • What if you thought season of the year impacted your sales? • Your demand function would include three dummies (why three) to test the impact of seasons. • This type of dummy variable is called an intercept dummy, since it changes the constant term but not the slopes of the other independent variables.
Criteria for choosing a specification • Occam’s razor or the principle of parsimony - model should be kept as simple as possible. • Goodness of fit • Theoretical consistency • Predictive power: Within sample vs. Out of sample
If you leave out an important variable a bias exists unless… • The true coefficient of the omitted variables is zero. • Or, there is zero correlation between the omitted variable(s) and the independent variables in the model. • If these conditions don’t hold, ommitted variables will bias the coefficients in our model.
What to do? • Add the missing variable. • What if you do not know which variable is missing? In other words, what if you suspect something is left out – thus producing “strange” results – but you do not know what?
Irrelevant Variables • Including an irrelevant variable will • Increase the standard errors of the variables, thus reducing t-stats. (think back to how standard errors are calculated) • Reduce adjusted R2 • It does not introduce bias in the estimated coefficients, but does impact our interpretation of what we found.
Four Important Specification Criteria • Theory: Is the variable’s place in the equation unambiguous and theoretically sound? • t-Test: Is the variable’s estimated coefficient significant in the expected direction? • Adjusted R2: Does the overall fit of the equation improve when the variable is added to the equation? • Bias: Do other variables’ coefficients change significantly when the variable is added to the equation?
Specification Searches: Other issues • Good idea to rely on theory rather than statistical fit. • Good idea to minimize the number of equations estimated. • Bad idea to do sequential Searches or estimate an undisclosed number of regressions before settling on a final choice. • Sensitivity Analysis: Are your results robust to alternative specifications? If not, maybe your not finding what you think you are finding.