Multiple Regression Analysis

Multiple Regression Analysis y = b0 + b1x1 + b2x2 + . . . bkxk + u 1. Estimation Slides by K. Clark adapted from P. Anderson

Recap of Simple Regression Assumptions: • The population model is linear in parameters: y = b0 + b1x + u [SLR.1] • There is a random sample of size n, {(xi, yi): i=1, 2, …, n}, from the population model. [SLR.2] • Assume E(u|x) = 0 and thus E(ui|xi) = 0 [SLR.3] • Assume there is variation in the xi[SLR.4] • Var(u|x) = σ2 [SLR.5]

Recap of Simple Regression • Applying the least squares method yields estimators of b0andb1 • R2 = 1-(SSR/SST) is a measure of goodness of fit • Under SLR.1-SLR.4 the OLS estimators are unbiased… • …and, with SLR.5, have variances which can be estimated from the sample…

Recap of Simple Regression • … if we estimate σ2 with SSR/(n-2). • Adding SLR.6, Normality, gives normally distributed errors and allows us to state that … …which is the basis of statistical inference. Alternative, useful, functional forms are possible.

Limitations of Simple Regression • In simple regression we explicitly control for only a single explanatory variable. • We deal with this by assuming SLR.3 • e.g. wage = β0 + β1educ + β2exper + u • Simple regression of wage on educ puts exper in u and assumes educ and u independent. • Simple regression puts a lot of weight on conditional mean independence.

Multiple Regression Model In the population we assume y = b0 + b1x1 + b2x2 + . . . bkxk + u • We are still explaining y. • There are k explanatory variables. • There are k+1 parameters. • k = 1 gets us back to simple regression.

Parallels with Simple Regression • b0 is still the intercept • b1 to bk all called slope parameters • u is still the error term (or disturbance) • Still need to make a zero conditional mean assumption, so now assume that • E(u|x1,x2, …,xk) = 0 or E(u|x) = 0 • Still minimizing the sum of squared residuals, however...

Applying Least Squares

Some Important Notation • xij is the i’th observation on the j’th explanatory variable • e.g. x32 is the 3rd observation on explanatory variable 2 • Not such a problem when we use variable names e.g. educ3

The First Order Conditions

The First Order Conditions • There are k + 1 first order conditions, solution of which is hard. • A matrix approach is “easier” but beyond the scope of ES5611. See Wooldridge, Appendix E. • In general each is a function of all the xj and the y.

Interpreting Multiple Regression

Example Consider the multiple regression model: wage = β0 + β1educ + β2exper + u Using the data in wage3.dta…

Example (Stata output) use http://www.ses.man.ac.uk/clark/es561/wage3 . regress wage educ exper Source | SS df MS Number of obs = 526 -------------+------------------------------ F( 2, 523) = 75.99 Model | 1612.2545 2 806.127251 Prob > F = 0.0000 Residual | 5548.15979 523 10.6083361 R-squared = 0.2252 -------------+------------------------------ Adj R-squared = 0.2222 Total | 7160.41429 525 13.6388844 Root MSE = 3.257 ------------------------------------------------------------------------------ wage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- educ | .6442721 .0538061 11.97 0.000 .5385695 .7499746 exper | .0700954 .0109776 6.39 0.000 .0485297 .0916611 _cons | -3.390539 .7665661 -4.42 0.000 -4.896466 -1.884613 ------------------------------------------------------------------------------

Example - interpretation The fitted regression line is: • Holding experience fixed, a one year increase in education increases the hourly wage by 64 cents. • Holding education fixed, a one year increase in experience increases the hourly wage by 7 cents. • When education and experience are zero wages are predicted to be -$3.30!

Simple vs Multiple Estimates

Assumptions for Unbiasedness Population model is linear in parameters: y = b0 + b1x1 + b2x2 +…+ bkxk+ u [MLR.1] {(xi1, xi2,…, xik, yi): i=1, 2, …, n} is a random sample from the population model, so that yi = b0 + b1xi1 + b2xi2 +…+ bkxik+ ui [MLR.2] E(u|x1, x2,… xk) = 0, implying that all of the explanatory variables are uncorrelated with the error [MLR.3] None of the x’s is constant, and there are no exact linear relationships among them [MLR.4]

Unbiasedness of OLS Under these assumptions: All of the OLS estimators of the parameters of the multiple regression model are unbiased estimators. This is not generally true if any one of MLR.1-MLR.4 are violated. Note MLR.4 more involved than SLR.4 and rules out perfect multicollinearity. MLR.3 more plausible than SLR.3?

Too Many or Too Few Variables • What happens if we include variables in our specification that don’t belong? • OLS estimators remain unbiased • There will, however, be an impact on the variance of the estimators • What if we exclude a variable from our specification that does belong? • OLS will usually be biased

Omitted Variable Bias

Omitted Variable Bias (cont)

Summary of Direction of Bias

Omitted Variable Bias Summary • Two cases where bias is equal to zero • b2 = 0, that is x2 doesn’t really belong in model • x1 and x2 are uncorrelated in the sample • If correlations between x2 , x1 and x2 , y are the same sign, bias will be positive • If correlations between x2 , x1 and x2 , y are the opposite sign, bias will be negative

Omitted Variable Bias: Example Suppose the model log(wage) = b0 + b1educ + b2abil + u satisfies MLR.1-MLR.4 abil is typically hard to observe so we estimate log(wage) = b0 + b1educ + u

The More General Case • Technically, can only sign the bias for the more general case if all of the included x’s are uncorrelated • Typically, then, we work through the bias assuming the x’s are uncorrelated, as a useful guide even if this assumption is not strictly true

Variance of the OLS Estimators Assume Var(u|x1, x2,…, xk) = s2 (Homoskedasticity) Let x stand for (x1, x2,…xk) Assuming that Var(u|x) = s2 also implies that Var(y| x) = s2 The 4 assumptions for unbiasedness, plus this homoskedasticity assumption are known as the Gauss-Markov assumptions

Variance of OLS (cont)

Components of OLS Variances • The error variance: a larger s2 implies a larger variance for the OLS estimators • The total sample variation: a larger SSTj implies a smaller variance for the estimators • Linear relationships among the independent variables: a larger Rj2 implies a larger variance for the estimators (multicollinearity)

Misspecified Models

Misspecified Models (cont) • While the variance of the estimator is smaller for the misspecified model, unless b2 = 0 the misspecified model is biased • Corrolary: including an extraneous or irrelevant variable cannot decrease the variance of the estimator • As the sample size grows, the variance of each estimator shrinks to zero, making the variance difference less important

The Gauss-Markov Theorem • Given our 5 Gauss-Markov Assumptions it can be shown that OLS is “BLUE” • Best • Linear • Unbiased • Estimator • Thus, if the assumptions hold, use OLS

Estimating the Error Variance • df = n – (k + 1), or df = n – k – 1 • df (i.e. degrees of freedom) is the (number of observations) – (number of estimated parameters)

Goodness-of-Fit R2 can also be used in the multiple regression context. R2 = SSE/SST = 1 – SSR/SST 0  R2  1 R2 has the same interpretation - the proportion of the variation in y explained by the independent (x) variables

More about R-squared • R2 can never decrease when another independent variable is added to a regression, and usually will increase • This is because SSR is non-increasing in k • Because R2will usually increase with the number of independent variables, it is not a good way to compare models

Adjusted R-Squared (Section 6.3) An alternative measure of goodness of fit is sometimes used The adjusted R2 takes into account the number of variables in a model, and may decrease

Adjusted R-Squared (cont) It’s easy to see that the adjusted R2 is just (1 – R2)(n – 1) / (n – k – 1), but Stata will give you both R2 and adj-R2 You can compare the fit of 2 models (with the same y) by comparing the adj-R2 You cannot use the adj-R2 to compare models with different y’s (e.g. y vs. ln(y))

Stata output again use http://www.ses.man.ac.uk/clark/es561/wage3 . regress wage educ exper Source | SS df MS Number of obs = 526 -------------+------------------------------ F( 2, 523) = 75.99 Model | 1612.2545 2 806.127251 Prob > F = 0.0000 Residual | 5548.15979 523 10.6083361 R-squared = 0.2252 -------------+------------------------------ Adj R-squared = 0.2222 Total | 7160.41429 525 13.6388844 Root MSE = 3.257 ------------------------------------------------------------------------------ wage | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- educ | .6442721 .0538061 11.97 0.000 .5385695 .7499746 exper | .0700954 .0109776 6.39 0.000 .0485297 .0916611 _cons | -3.390539 .7665661 -4.42 0.000 -4.896466 -1.884613 ------------------------------------------------------------------------------

Summary: Multiple Regression • Many of the principles the same as simple regression • Functional form results the same • Need to be aware of the role of the assumptions • Have only focused on estimation; consider inference in the next lecture

Next Time • Remember: no lectures or classes in Reading Week (week beginning 1st November) • Read Wooldridge! • Next topic is inference in (multiple) regression (Chapter 4)

Multiple Regression Analysis