Bolasso : Model Consistent Lasso Estimation through the Bootstrap

Bolasso: Model Consistent Lasso Estimation through the Bootstrap Bach-ICML08 Presented by Sheehan Khan to “Beyond Lasso” reading group April 9, 2009

Outline • Follows the structure of the paper • Define Lasso • Comments on scaling the penalty • Bootstrapping/Bolasso • Results • A few afterthoughts on the paper • Synopsis of 2009 Extended Tech report • Discussion2

Problem formulation • Standard Lasso formulation • New notation (consistent with ICML08) • Response vector (n samples) • Design matrix (n samples xp features) • Generative model

How should we set μn? • Shows 5 mutual exclusive possibilities • implies • means we minimize implies is not a consistent estimate of

How should we set μn? • slower than requires denotes the active set • faster thanloose the sparsifying effect of l1 penalty We saw similar arguments in Adaptive Lasso

How should we set μn? • we can state: Prop1: Prop2: *Dependence on Q omitted in the body of paperbut appears in the appendix

So what? • Props 1&2 tell us that asymptotically: • We have positive probability selecting the active features • We have vanishing probability of missing active features • We may or may not get additional non-active features based on the dataset • With many independent sets, the common features must be the active sets

Bootstrap • In practice we do not get many datasets • We can use m bootstrap replications of the given set • For now we use pairs, later we will used centered residuals

Bolasso

Asymptotic Error • Prop3:Given that we have • Can be tightened if

Results on Synthetic Data • 1000 samples • 16 features (first 8 active) • Average over 256 datasets • Force lasso (black) bolasso (red) lasso bolasso (m=128) m=2,4,8…256

Results on Synthetic Data • 64 features (8 active) • Error is squared distance between sparsity pattern vectors averaged over 32 datasets lasso(black), bolasso(green), forward greedy(magenta), threshold LS(red), adaptive lasso(blue)

Results on Synthetic Data • 64 samples • 32 features (8 active) • Bolasso-S has soft intersection (90%) MSE prediction 1.24???

Results on UCI data MSE prediction

Some thoughts • Why do they compare bolasso variable selection error to lasso, forward greedy, threshold LS, and adaptive lasso but then compare mean square prediction to lasso, ridge and bagging? • All these results have low dimensional data, we are interested in large amounts of features • This is considered in the 2009 tech. rep. • Based on the plots it seems that its best to use as large as possible (in contrast to Prop3) • Is there any insight to the size of positive constants which have a huge impact? • Based on the results it seems that we really want to use bolasso in the problems where we know this bound to be loose

2009 Tech Report • Main extensions • Fills in the math details omitted previously • Discusses bootstrap pairs vs. residuals • Proves both consistent in low dimensional data • Show empirical results favouring residuals in high dimensional data • New upper and lower bounds for selecting active components in low dimensional data • Propose similar method for high dimensions • Lasso with high regularization parameter • Then bootstrap within the supports • Discusses implementation details

Bootstrap Recap • Previously we sampled uniformly from the given dataset with replacement to generate bootstrap set • Done in parallel • Bootstrapping can also be done sequentially • We saw this when reviewing Boosting

Bootstrap Residuals • Compute residual errors based on lasso using the current dataset • Compute centered residuals • Create a new dataset from the pairs

Synthetic Results in High Dimensional Data • 64 samples, 128 features (8 active)

Varying Replications in High Dimensional Data lasso(black), bollasom={2,4,8,16,32,64,128,256}(red), m=512(blue)

The End • Thanks for your attention and participation • Questions/Discussion???

Bolasso : Model Consistent Lasso Estimation through the Bootstrap

Bolasso : Model Consistent Lasso Estimation through the Bootstrap

Presentation Transcript

Project Estimation and scheduling

Generalized method of moments estimation

Estimation taking account of sample selection with Stata

Chapter 15

Motion estimation

Motion estimation

Model Building For ARIMA time series

Activity Analysis, Cost Behavior, and Cost Estimation

Articulated Pose Estimation with Flexible Mixtures of Parts

Chapter 11 Simultaneous Equations Models

Linq

Probabilistic Inference Lecture 2

Discrete Choice Modeling

Topics in Microeconometrics

Software Estimation

Statistics for Business and Economics

Comparison between various methods of age estimation in skeletal remains

BOOTSTRAP FINANCE

orth

Chapter 2 The Simple Linear Regression Model: Specification and Estimation