240 likes | 453 Views
chapter eight. Multiple Regression: Estimation and Hypothesis Testing. Three Variable Model. Any individual Y value expressed as a sum of a systematic or deterministic component and a nonsystematic or random component B 2 and B 3 are partial regression coefficients.
E N D
chapter eight Multiple Regression: Estimation and Hypothesis Testing
Three Variable Model • Any individual Y value expressed as a sum of a systematic or deterministic component and a nonsystematic or random component • B2 and B3 are partial regression coefficients
Partial slope or regression coefficients • For example, B2 measures the change in the mean value of Y per unit change in X2, holding X3 constant. • This reflects the partial effect of one explanatory variable on the mean value of the dependent variable when the values of the other explanatory variables are held constant. • Regression can isolate the effect of each X variable on Y from all the other X variables.
Assumptions of the Multiple LRM • The regression model is linear in the parameters • X2 and X3 are uncorrelated with u (always true for nonstochastic X’s) • E(ui) = 0 • Homoscedasticity: var(ui) = σ2 • No autocorrelation: cov(ui, uj) = 0, i ≠ j • No exact collinearity between X2 and X3 • For hypothesis testing: ui~ N(0, σ2)
Multicollinearity • Two variables are collinear if one variable is an exact linear function of the other • X2i = 3 + 2X3i or X2i = 4X3i • In this case, a two variable model collapses to a one variable model as X2 and X3 are not independent • the individual effects of X2 and X3 cannot be isolated • B2 and B3 cannot be estimated • Multicollinearity refers to multiple cases of collinearity in models with more than 2 explanatory variables • Perfect collinearity is rare, but high or near perfect collinearity is common.
OLS Estimators • Choose values for unknown parameters so as to minimize the RSS • As in the two-variable case, calculus and some algebra yield the formulas for the intercept and slope parameters • Equations for b2 and b3 are symmetric with common denominators
Properties of OLS Estimators • OLS estimators for the multiple linear regression model are BLUE • Linear • Unbiased • Efficient: minimum variance among linear, unbiased estimators
Goodness of Fit, R2, and Hypothesis Testing • Multiple Coefficient of Determination, R2 • TSS = ESS + RSS • R2 = ESS/TSS or 1-(RSS/TSS) • If ui~ N(0, σ2), then the bi are normally distributed with means Bi, i =1, 2, 3 • t = (bi – Bi)/se(bi) ~ tn-3 • Example: Antique Clock Auction
Testing B2 = B3 = 0, or R2 = 0 • H0: B2 = B3 = 0 equivalent to H0: R2 = 0 • Test of the overall significance of the multiple regression • Degrees of freedom • TSS: n – 1 always • RSS: n – k • ESS: k – 1 • F = (ESS/d.f.)/(RSS/d.f.) ~ F(k-1),(n-k) • (Variance explained by X1, X2)/(unexplained variance) • See Tables 8-1 and 8-2 (Antique Clock Example).
Table 8-1 ANOVA table for the three-variable regression.
Table 8-2 ANOVA table for the clock auction price example.
Relationship between F and R2 • F and R2 are directly related • R2 = 0, F = 0 • Larger R2, larger F • R2 = 1, F is infinite • H0: B2 = B3 = 0 equivalent to H0: R2 = 0 • See Table 8-3.
Table 8-3 ANOVA table in terms of R2.
Specification Bias • Suppose we ran a regression of Antique Clock Auction prices against age and number of bidders separately. • How could we compare these regressions to the multiple regression using both age and number of bidders as explanatory variables? • Since both age and number of bidders contribute significantly to the explanation of prices in these regressions (by the t and F tests), the one and two-variable models summarized in Table 8-4 are mis-specified.
Table 8-4 A comparison of four models of antique clock auction prices.
Comparing R2 Values: The Adjusted R2 • Note Two Properties of R2 • R2 from separate regressions on the same dependent variable but with different explanatory variables will not take into account the different degrees of freedom (k – 1, n – k, etc.) • R2 increases whenever more explanatory variables are added to a regression equation. • Solution: The Adjusted R2 • Adj. R2 = 1 – {(1 – R2)[(n – 1)/(n – k)]}
Properties of the Adjusted R2 • If k > 1, adj. R2< unadj. R2 • Unadjusted R2 is always positive, but the adjusted R2 can become negative (when R2 is very small). • The adj. R2 can be compared across regressions with the same dependent variable. • It is common practice to add explanatory variables as long as the adj. R2 increases.
A Problem with Adj. R2 • The adj. R2 increases as additional explanatory variables are added, if the |t| > 1 for the null hypothesis that the last added variable’s coefficient = 0. See Table 8-4. • Note that the square of the t-value for the age coefficient in row 2 is approx. the F-value. • This is because tk2 ~ F1,k • As explanatory variables are added, both adj. and unadj. R2 increase. • Should both age and number of bidders be added? Yes! To decide, use the F-test.
Restricted Least Squares • Use Table 8-4 to test for whether both age and number of bidders should be added. • Call regression in row 1 the restricted regression • Call regression in row 4 the unrestricted regression • Calculate F as shown, where m is number of restrictions, n is obs and k is number of parameters in unrestricted reg.
Restricted Least Squares • Use results in 8-4 to calculate F • P(F > 117)<<0.01 • Use of both age and number of bidders as explanatory variables adds significantly to the explanatory power of the regression.
How to decide to add explanatory variables • Single variable (one variable at a time), add if • t-test is significant (H0: coefficient = 0) • Adj. R2 increases • For groups of variables • Restricted least squares F-test • “Unrestricted” regression has more exp. variables • “Restricted” regression has fewer exp. Variables • “number of restrictions” m is the difference in the number of parameters (coefficients) in the two equations
Example from ElectricExcel2.xls • Calculate F • P(F3,31> 0.86) • > 0.25 • Not significant • At less than 10% • Adding the 3 exp. var. does not significantly increase explanatory power