Economics 105: Statistics

Economics 105: Statistics Go over GH 24

Risks in Model Building • Including irrelevant X’s • Increases complexity • Reduces adjusted R2 • Increases model variability across samples • Omitting relevant X’s • Fails to capture fit • Can bias other estimated coefficients • Where omitted X is related to both other X’s and to the dependent variable (Y)

More Risks:Samples Can Mislead • Remember: we are using sample data • About 5% of the time, our sample will include random observations of X’s that result in betahat’s that meet classical hypothesis tests • Or the beta’s may be important, but the sample data will randomly include observations of X that do not meet the statistical tests • That’s why we rely on theory, prior hypotheses, and replication

I know! We can save the model, but not until Eco205. Holy endogeneity, Batman! Violations of GM Assumptions Assumption Violation Wrong functional form Omit Relevant Variable (Include Irrelevant Var) Errors in Variables Sample selection bias, Simultaneity bias “well-specified model” (1) & (5) constant, nonzero mean due to systematically +/- measurement error in Y can only assess theoretically zero conditional mean of errors (2) Homoskedastic errors (3) Heteroskedastic errors No serial correlation in errors (4) There exists serial correlation in errors

Multiple Regression Assumptions (1) Linear function in the parameters, plus error Variation in Y is caused by , the error (as well as X) (2) Sources of error Idiosyncratic, “white noise” Measurement error on Y Omitted relevant explanatory variables If (2) holds, we have exogenous explanatory vars If some Xj is correlated with error term for some reason, then that Xj is an endogenous explanatory var

Multiple Regression Assumptions (3) Homoskedasticity (4) No autocorrelation (5) Errors and the explanatory variables are uncorrelated (6) Errors are i.i.d. normal

Multiple Regression Assumption (7) No perfect multicollinearity no explanatory variable is an exact linear function of other X’s Venn diagram Other implicit assumptions data are a random sample of n observations from proper population n > K, and ideally n much greater than K the little xij’s are fixed numbers (the same in repeated samples) or they are realizations of random variables, Xij, that are independent of error term & then inference is done CONDITIONAL on observed values of xij’s

Violation of Assumptions (1 & 5): well-specified model • true model is (A) • but we run (B) • Including an irrelevant variable • is an unbiased estimator of • ; less efficient • estimator of , , is unbiased • t & F tests are valid Specification Bias

Violation of Assumptions (1&5): well-specified model • true model is (C) • but we run (D) • Omitting a relevant variable • is a biased estimator of • is actually smaller; more efficient • estimator of , , is now biased • t & F tests are incorrect Specification Bias

When is an unbiased estimator of ? • b21 is the slope coefficient from a regression of the EXCLUDED variable on the INCLUDED variable Omitted Variable Bias

Omitted Variable Bias Subcript c indexes 64 countries Descriptive statistics

Omitted Variable Bias

Omitted Variable Bias … approximately equal

Economics 105: Statistics