220 likes | 816 Views
Multiple Regression 2. A Forward Selection Heuristic. Regress Y on each k potential X variables. Determine the best single variable model. Regress Y on the best variable and each of the remaining k-1 variables.
E N D
Multiple Regression 2 Model Selection
A Forward Selection Heuristic • Regress Y on each k potential X variables. • Determine the best single variable model. • Regress Y on the best variable and each of the remaining k-1 variables. • Determine the best model that includes the previous best variable and one new best variable. • If either the adjusted-R2 declines, the standard error of the regression increases, the t-statistic of the best variable is insignificant, or the coefficients are theoretically inconsistent, STOP, and use the previous best model. Repeat 2-4 until stopped or an all variable model has been reached. Model Selection
The idea behind Forward Selection • -If the adjusted-R2 declines when an additional variable is added, then the added value of the variable does not outweigh its modeling cost. • If the standard error increases then the additional variable has not improved estimation. • If the t-statistic of one of the variables is insignificant then there may be too many variables. • If the coefficients are inconsistent with theory may indicate multicollinearity effects. Model Selection
The Backward Elimination Heuristic • Regress Y on all k potential X variables • Use t-tests to determine which X has the least amount of significance • If this X does not meet some minimum level of significance, remove it from the model • Regress Y on the set of k-1 X variables Repeat 2-4 until all remaining Xs meet minimum Model Selection
Use Tests One at a Time The tests should be used one at a time. • T1 can tell you to drop X1 and keep X2-X6 • T2 can tell you to drop X2 and keep X1 and X3-X6 • Together, they don’t necessarily tell you to drop both and keep X3-X6 Multiple Regression 1
The idea behind Backwards Elimination If tstatnot significant, we can remove an X and simplify the model while still maintaining the model’s high Rsquare. Typical stopping rule Continue until all Xs meet some target “significance level to stay” (often .10 or .15 to keep more Xs). Model Selection
Concordance • The forward and backward heuristics may or may not result in the same end model. Generally however the resulting models should be quite similar. • The backwards elimination model requires that you start with a model that includes all possible explanatory variables. But, for example, Excel will only conduct regression for up to 16 variables. Model Selection
Multi-collinearity • When using many variables in a regression, it may be the case that some of the explanatory variables are highly correlated with other explanatory variables. In the extreme when two of the variables are linearly related, the multiple regression will fail as unstable. • Simple indicators are a failure of the F-test; an increase in Standard Error; insignificant t-statistic for a previously significant variable; theoretically inconsistent coefficients. • Recall also that when using a categorical variable, one of the categories must be “left out”. Model Selection
VIF as a measure of multi-collinearity • The variance-inflation-factors (VIFs) should be calculated after reaching a supposed stopping point in a multiple regression selection method. • The VIFs are calculated for each independent variable by regressing that INDEPENDENT VARIABLE against the other independent variables = 1 / (1-R2) • A simple rule-of-thumb is that the VIFs should be less than 4. Model Selection
Subsets of Variables • The forward and backward heuristic rely on adding or deleting one variable at a time. • It is however possible to evaluate the statistical significance of including a set of variables by constructing the partial F-statistic. Model Selection
The “full” and “reduced” models • Suppose there are r variables in the group • Define the full modelto be the one with all Xs (all k predictors) • Define the reduced modelto be the one with the group left out (it has k-r variables). Multiple regression 5 -- The partial F test
Partial F Statistic • Look at the increase in the sum of squared errors SSEReduced – SSEFullto see how much of the explained variation is lost. • Divide this by r, the number of variables in the group. • Put this in ratio to the MSE of the full model. • This is called the partial F statistic. Multiple regression 5 -- The partial F test
Partial F Statistic This has an F distribution with r numerator and (n-k-1) denominator degrees of freedom Multiple regression 5 -- The partial F test
Two regression runs Full Reduced Multiple regression 5 -- The partial F test
The Partial F for 4 variables Ho: Four variable coefficients are insignificant H1: at least one variable coefficient in the group is useful (889.042 – 765.939 )/4 30.776 F = -------------------- = ----- = 3.255 9.456 9.456 The correct F dist to test against is 4 numerator and 81 denominator degrees of freedom. The value for a (4,60) distribution is 2.53 at a significance level of .05 and 3.65 at a significance level of .01 Multiple regression 5 -- The partial F test
Extensions • Two lines, different slopes • More than two categories • Multicategory, multislope Multiple Regression 4: Indicator Variables
Fit two lines with different slopes • Recall that using the Executive variable alone created a salary model with two lines having different intercepts. • Adding the variable Alpha Experience resulted in a model also having two lines with different intercepts. • But, what if there is an interaction effect between Executive status and Alpha experience. Multiple regression 5 -- The partial F test
Create two new variables. • The Executive status variable has two categories: 0 and 1. • Create two variables from Alpha experience so that • when Executive =0, Alpha retains its value, otherwise it equals 0. • When Executive = 1, Alpha retains its value, otherwise it equals 0. • Using now three variables, Executive status and the two alpha variables will result in a model with two lines having different intercepts and different slopes capturing a simple interaction effect among the variables. Multiple regression 5 -- The partial F test
Executive Status variable Model Selection
Executive Status and Alpha Experience Model Selection
Executive Status and Alpha Experience with Interaction Model Selection