Model Selection and Estimation in Regression with Grouped Variables

Model Selection and Estimation in Regression with Grouped Variables

Remember….. • Consider fitting this simple model: with arbitrary explanatory variables X1, X2, X3 and continuous Y. • If we want to determine whether X1, X2, X3 are predictive of Y, we need to take into account the groups of variables derived from X1, X2, X3. • 2nd Example: ANOVA (dummy variables of a factor form the groups)

Remember….. • Group LARS proceeds in two steps: • A solution path that is indexed by a tuning parameter λ is built. (Solution path is just a “path” of how the estimated coefficients move in space as a function of λ) 2) The final model is selected on the solution path by some “minimal risk” criterion.

Notation • Model form: • Assume we have J factors/groups of variables • Y is (n x 1) • ε ~MVN(0, σ2) • pj is the number of variables in group j • Xj is (n x pj) design matrix for group j • βj is the coefficient vector for group j • Each Xj is centered/ortho-normalized and Y is centered.

Remember….. Group LARS Solution Path Algorithm (Refresher): • Compute the current ‘most correlated set’ (A) by adding in the factor that maximizes the “correlation” between the current residual and the factor (accounting for factor size). • Move the coefficient vector (β) in the direction of the projection of our current residual onto the factors in (A). • Continue down this path until a new factor (outside (A)) has the same correlation as factors in (A). Add that new factor into (A). • Repeat steps 2-3 until we have no more factors that can be added to (A). • (Note: solution path is piecewise linear, so computationally efficient!)

Cp Criterion (How to Select a Final Model) • In gaussian regression problems, an unbiased estimate of “true risk” is where . • When the full design matrix X is orthonormal, it can be shown that an unbiased estimate for “df” is: • Note the orthonormal Group LARS solution is:

Degree-of-Freedom Calculation (Intuition) • When the full design matrix X is orthonormal, it can be shown that an unbiased estimate for “df” is: • Note the orthonormal Group LARS solution is: • The general formula for “df” is:

Real Dataset Example • Famous Birthweight dataset from Hosmer/Lemeshow. • Y = Baby birthweight, 2 continuous predictors (Age/weight of mother), 6 categorical predictors. • For continuous predictors, use 3rd-order polynomials for “factors”. • For categorical predictors, use “dummy variables” excluding the final group. • 75%/25% train/test split. • Methods Compared: Group LARS, Backward Stepwise (LARS isn’t possible)

Real Dataset Example Minimal Cp

Real Dataset Example • Factors Selected: Group LARS: All factors except Number of Physician Visits during the First Trimester Backward Stepwise: All factors except Number of Physician Visits during the First Trimester & Mother’s Weight

Real Dataset Example

Simulation Example #1 • 17 random variables Z1, Z2,…, Z16, W were independently drawn from a Normal(0,1). • Xi = (Zi + W) / SQRT(2) • Y = X33 + X32 + X3 + (1/3)*X63 - X62 + (2/3)*X6 + ε • ε ~ N(0, 22) • Each simulation has 100 observations, 200 simulations. • Methods Compared: Group LARS, LARS, Least Squares, Backward Stepwise • All 3rd-order main effects are considered.

Simulation Example #1

Simulation Example #2 • 20 random variables X1, X2,…, X20 were generated as in Example #1. • X11, X12,…, X20 are trichotomized as 0, 1, or 2 if they are smaller than the 33rd percentile of a Normal(0,1), larger than the 66th percentile, or in between. • Y = X33 + X32 + X3 + (1/3)*X63 - X62 + (2/3)*X6 + 2 * I(X11 = 0) + I(X11 = 1) + ε • ε ~ N(0, 22) • Each simulation has 100 observations, 200 simulations. • Methods Compared: Group LARS, LARS, Least Squares, Backward Stepwise • All 3rd-order main effects/categorical factors are considered.

Simulation Example #2

Conclusion • Group LARS provides an improvement over the traditional backward stepwise selection + OLS, but still over-selects factors. • In the simulations, stepwise selection tends to under-select factors relative to Group LARS, and performs more poorly. • Simulation #1 suggests LARS over-selects factors because it enters individual variables into the model (and not the full factor). • Group LARS is also computationally efficient due to its piecewise linear solution path algorithm. • is the formula for the “correlation” between a factor j and the current residual r. May select factors if a couple derived inputs are predictive and the rest being redundant.

EL FIN

Model Selection and Estimation in Regression with Grouped Variables

Model Selection and Estimation in Regression with Grouped Variables

Presentation Transcript

Grouped and Hierarchical Model Selection through Composite Absolute Penalties (CAP)

Estimation and Model Selection for Geostatistical Models

Regression II Model Selection

The Simple Linear Regression Model Specification and Estimation

Parallelizable Algorithms for the Selection of Grouped Variables

Regression With Categorical Variables

Tabu Search for Model Selection in Multiple Regression

Model Development and Selection of Variables

Model Development and Selection of Variables

Estimation in the Two-Variable Regression Model-- Continued

Parallelizable Algorithms for the Selection of Grouped Variables

REGRESSION MODEL WITH TWO EXPLANATORY VARIABLES

Instrumental Variables Regression

Regression: Choosing Variables

Regression With Categorical Variables

Objectives: Model relationships with variables

Grouped and Hierarchical Model Selection through Composite Absolute Penalties (CAP)

REGRESSION WITH TIME SERIES VARIABLES

Interaction Terms and dummy variables in Regression

REGRESSION WITH TIME SERIES VARIABLES

Some Model Selection Criteria for Regression

The Simple Linear Regression Model Specification and Estimation