Linear Functional Form

Linear Functional Form • Y = β0 + β1 X1 + β2 X2 + ε • Slope = β1 • Impact of X1 on Y is independent of the quantity of X2. • Elasticity = β1 * [X1/ Y]

Double-Log Functional Form • What if you wished to estimate the following model? • Y = β0 X1β1 X2β2 • To make this linear in the parameters • InY = β0 + β1 InX1 + β2 InX2 + ε • Slope = β1 = ΔlnY / ΔlnX1 = [ΔY / Y] / [ΔX1 / X1] • What is this? The elasticity, which is constant across the sample.

What is the slope in a double-functional form? • Slope = β1 * (Y/X) = [ΔY / Y] / [ΔX1 / X1] * (Y/X) = ΔY / ΔX • Impact of X1 on Y depends upon the quantity of X2 • In other words, the slope of X1 varies across the sample. • Why would this be a realistic property?

Other Functional Form • Semi-log functional form • Polynomial Form • Inverse Form • Know the equation and meaning of β1 for each of these forms. • More specifically, know the calculation of slope and elasticity for each functional form.

Problems with Incorrect Functional Form You cannot compare R2 between two different functional forms. • Why? TSS will be different. • An incorrect functional form may work within sample but have large forecast errors outside of sample. • Violation of Classical Assumption I: The regression model is linear in the coefficients, is correctly specified, and has an additive error term.

Testing for Functional Form • The Quasi-R2 • Box-Cox Test • The MacKinnon, White, Davidson Test (MWD)

Quasi R2 • Estimate a logged model and create a set of LnY^ (predicted logged dependent variable). • Transform LnY^ by taking the anti-log. In Excel (@exp) is the function needed. • Calculate a new RSS with the results of step 2. • Calculate the quasi-R2 with the results of step 3.

The Box Cox Test • Calculate the geometric mean of the dependent variable in the model. • This can easily be calculated in Excel • Create a new dependent variable equal to Yi / Geometric Mean of Y • Re-estimate both forms of the model, with your new dependent variable. Compare the Residual Sum of Squares. Lowest value is the preferred functional form.

MWD Test • Estimate the linear model an obtain the predicted Y values (call this Yf^). • Estimate the double-logged model an obtain the predicted lnY values (call this lnf^). • Create Z1 = ln(Yf^) – lnf^ • Regress Y on X’s and Z1. Reject Ho (Y is a linear function of independent variables) if Z1 is statistically significant by the usual t-test. • Create Z2 = antilog of lnf^ - Yf^ • Regress log of Y on log of X’s and Z2. Reject HA (double-logged model is best) if Z2 is statistically significant by the usual t-tests.

Intercept Dummies • What if you thought season of the year impacted your sales? • Your demand function would include three dummies (why three) to test the impact of seasons. • This type of dummy variable is called an intercept dummy, since it changes the constant term but not the slopes of the other independent variables.

Slope Dummies • Interaction Term – an independent variable in a regression that is the multiple of two or more independent variables. • This can be used to see if a qualitative condition, which we would analyze with a dummy, impacts the slope of another independent variable.

Criteria for choosing a specification • Occam’s razor or the principle of parsimony - model should be kept as simple as possible. • Goodness of fit • Theoretical consistency • Predictive power: Within sample vs. Out of sample

If you leave out an important variable a bias exists unless… • The true coefficient of the omitted variables is zero. • Or, there is zero correlation between the omitted variable(s) and the independent variables in the model. • If these conditions don’t hold, omitted variables will bias the coefficients in our model.

What to do? • Add the missing variable. • What if you do not know which variable is missing? In other words, what if you suspect something is left out – thus producing “strange” results – but you do not know what?

Irrelevant Variables • Including an irrelevant variable will • Increase the standard errors of the variables, thus reducing t-stats. (think back to how standard errors are calculated) • Reduce adjusted R2 • It does not introduce bias in the estimated coefficients, but does impact our interpretation of what we found.

Four Important Specification Criteria • Theory: Is the variable’s place in the equation unambiguous and theoretically sound? • t-Test: Is the variable’s estimated coefficient significant in the expected direction? • Adjusted R2: Does the overall fit of the equation improve when the variable is added to the equation? • Bias: Do other variables’ coefficients change significantly when the variable is added to the equation?

Specification Searches: Other issues • Good idea to rely on theory rather than statistical fit. • Good idea to minimize the number of equations estimated. • Bad idea to do sequential Searches or estimate an undisclosed number of regressions before settling on a final choice. • Sensitivity Analysis: Are your results robust to alternative specifications? If not, maybe your not finding what you think you are finding.

Linear Functional Form