170 likes | 448 Views
Model Selection and Estimation in Regression with Grouped Variables. Remember…. Consider fitting this simple model: with arbitrary explanatory variables X 1 , X 2, X 3 and continuous Y.
E N D
Model Selection and Estimation in Regression with Grouped Variables
Remember….. • Consider fitting this simple model: with arbitrary explanatory variables X1, X2, X3 and continuous Y. • If we want to determine whether X1, X2, X3 are predictive of Y, we need to take into account the groups of variables derived from X1, X2, X3. • 2nd Example: ANOVA (dummy variables of a factor form the groups)
Remember….. • Group LARS proceeds in two steps: • A solution path that is indexed by a tuning parameter λ is built. (Solution path is just a “path” of how the estimated coefficients move in space as a function of λ) 2) The final model is selected on the solution path by some “minimal risk” criterion.
Notation • Model form: • Assume we have J factors/groups of variables • Y is (n x 1) • ε ~MVN(0, σ2) • pj is the number of variables in group j • Xj is (n x pj) design matrix for group j • βj is the coefficient vector for group j • Each Xj is centered/ortho-normalized and Y is centered.
Remember….. Group LARS Solution Path Algorithm (Refresher): • Compute the current ‘most correlated set’ (A) by adding in the factor that maximizes the “correlation” between the current residual and the factor (accounting for factor size). • Move the coefficient vector (β) in the direction of the projection of our current residual onto the factors in (A). • Continue down this path until a new factor (outside (A)) has the same correlation as factors in (A). Add that new factor into (A). • Repeat steps 2-3 until we have no more factors that can be added to (A). • (Note: solution path is piecewise linear, so computationally efficient!)
Cp Criterion (How to Select a Final Model) • In gaussian regression problems, an unbiased estimate of “true risk” is where . • When the full design matrix X is orthonormal, it can be shown that an unbiased estimate for “df” is: • Note the orthonormal Group LARS solution is:
Degree-of-Freedom Calculation (Intuition) • When the full design matrix X is orthonormal, it can be shown that an unbiased estimate for “df” is: • Note the orthonormal Group LARS solution is: • The general formula for “df” is:
Real Dataset Example • Famous Birthweight dataset from Hosmer/Lemeshow. • Y = Baby birthweight, 2 continuous predictors (Age/weight of mother), 6 categorical predictors. • For continuous predictors, use 3rd-order polynomials for “factors”. • For categorical predictors, use “dummy variables” excluding the final group. • 75%/25% train/test split. • Methods Compared: Group LARS, Backward Stepwise (LARS isn’t possible)
Real Dataset Example Minimal Cp
Real Dataset Example • Factors Selected: Group LARS: All factors except Number of Physician Visits during the First Trimester Backward Stepwise: All factors except Number of Physician Visits during the First Trimester & Mother’s Weight
Simulation Example #1 • 17 random variables Z1, Z2,…, Z16, W were independently drawn from a Normal(0,1). • Xi = (Zi + W) / SQRT(2) • Y = X33 + X32 + X3 + (1/3)*X63 - X62 + (2/3)*X6 + ε • ε ~ N(0, 22) • Each simulation has 100 observations, 200 simulations. • Methods Compared: Group LARS, LARS, Least Squares, Backward Stepwise • All 3rd-order main effects are considered.
Simulation Example #2 • 20 random variables X1, X2,…, X20 were generated as in Example #1. • X11, X12,…, X20 are trichotomized as 0, 1, or 2 if they are smaller than the 33rd percentile of a Normal(0,1), larger than the 66th percentile, or in between. • Y = X33 + X32 + X3 + (1/3)*X63 - X62 + (2/3)*X6 + 2 * I(X11 = 0) + I(X11 = 1) + ε • ε ~ N(0, 22) • Each simulation has 100 observations, 200 simulations. • Methods Compared: Group LARS, LARS, Least Squares, Backward Stepwise • All 3rd-order main effects/categorical factors are considered.
Conclusion • Group LARS provides an improvement over the traditional backward stepwise selection + OLS, but still over-selects factors. • In the simulations, stepwise selection tends to under-select factors relative to Group LARS, and performs more poorly. • Simulation #1 suggests LARS over-selects factors because it enters individual variables into the model (and not the full factor). • Group LARS is also computationally efficient due to its piecewise linear solution path algorithm. • is the formula for the “correlation” between a factor j and the current residual r. May select factors if a couple derived inputs are predictive and the rest being redundant.