Linear Regression with Multiple Regressors

Linear Regression with Multiple Regressors y = b0 + b1x1 + b2x2 + . . . bkxk + u Definition Estimation Properties Economics: 332 - 11

Outline • Omitted variable bias • Multiple regression and OLS • Measures of fit • Sampling distribution of the OLS estimator

Omitted Variable Bias (SW Section 6.1) The error u arises because of factors, or variables, that influence Y but are not included in the regression function. There are always omitted variables. Sometimes, the omission of those variables can lead to bias in the OLS estimator.

Omitted variable bias, ctd. The bias in the OLS estimator that occurs as a result of an omitted factor, or variable, is called omitted variablebias. For omitted variable bias to occur, the omitted variable “Z” must satisfy two conditions: The two conditions for omitted variable bias • Z is a determinant of Y (i.e. Z is part of u); and • Z is correlated with the regressor X (i.e. corr(Z,X) ≠ 0) then ρXu≠ 0 and the OLS estimator is biased and is not consistent. Both conditions must hold for the omission of Z to result in omitted variable bias.

Motivation for adding more independent variable (xi) • Omitted Variable Bias • The assumption that has the largest impact is SLR.4: E (U|X) = 0. • U include variables other than X that affects Y. In other words, U includes the variables that are omitted. • Assume that there is one omitted variable Z: • U = β2Z + ε

Motivation for adding more independent variable (xi) Omitted Variable Bias - the case of no bias

Motivation for adding more independent variable (xi) Omitted Variable Bias

Motivation for adding more independent variable (xi) • How to deal with Omitted Variable Bias? • We thus need regressions that have more than one regressor.

AddingManyRegressors: Multiple Regression

Estimation of Multiple Regression

Estimation: A Partialling-Out Form

Example: Interpreting Multiple Regression • -equationexplaining log(wage). • -educ (years of education). • -exper (years of labor market experience). • -tenure (years with the current employer). • 526 observations on workers. • The coefficient .092 means that, holding exper and tenure fixed, another year of education is predicted to increase log(wage) by .092, which translates into an approximate 9.2 percent [100(.092)] increase in wage.

OLS Fitted Values and Residuals The fitted values for observation i is defined just as in the simple regression case: The residual for observation i is defined just as in the simple regression case:

Simple vs Multiple Reg Estimate

Goodness-of-Fit The R2 is the fraction of the variance explained – same definition as in regression with a single regressor: R2 = = where ESS = , SSR = , TSS = The R2 always increases when you add another regressor (why?) – a bit of a problem for a measure of “fit” The Over-fitting Problem of R-squared

Goodness-of-Fit The (the “adjusted R2”) corrects this problem by “penalizing” you for including another regressor – the does not necessarily increase when you add another regressor. Adjusted R2: = Note that < R2, however if n is large the two will be very close. Adding more variables increase K and may decrease

Unbiasedness of OLS Estimators

Examples of Perfect Colinearity

Unbiasedness of OLS Estimators

Variance of the OLS Estimators • Now we know that the sampling distribution of our estimate is centered around the true parameter • Want to think about how spread out this distribution is • Much easier to think about this variance under an additional assumption

An Additional Assumption

Variance of OLS Estimators Assuming that Var(u|x) = s2 also implies that Var(y| x) = s2

Estimating the Error Variance • We don’t know what the error variance, s2, is, because we don’t observe the errors, ui • What we observe are the residuals, ûi • We can use the residuals to form an estimate of the error variance

Error Variance Estimate (cont) • df = n – (k + 1), or df = n – k – 1 • df (i.e. degrees of freedom) is the (number of observations) – (number of estimated parameters)

Components of OLS Variances • The error variance: a larger s2 implies a larger variance for the OLS estimators • The total sample variation: a larger SSTj implies a smaller variance for the estimators • Linear relationships among the independent variables: a larger Rj2 implies a larger variance for the estimators

Gauss-Markov Theorem

Irrelevant Variables and Omitted Variables • What happens if we include variables in our specification that don’t belong? • There is no effect on our parameter estimate, and OLS remains unbiased • What if we exclude a variable from our specification that does belong? • OLS will usually be biased

Irrelevant Variable

Omitted Variable

Summary of Direction of Bias

Omitted Variable Bias Summary • Two cases where bias is equal to zero (Unbiased) • b2 = 0, that is x2 doesn’t really belong in model • x1 and x2 are uncorrelated in the sample • If correlation between x2 , x1 and x2 , y is the same direction, bias will be positive • If correlation between x2 , x1 and x2 , y is the opposite direction, bias will be negative

Linear Regression with Multiple Regressors