210 likes | 361 Views
Sparse, Flexible and Efficient Modeling using L 1 -Regularization. Saharon Rosset and Ji Zhu. Contents. Idea Algorithm Results. Part 1: Idea. Introduction. Setting: Implicit dependency on training data Linear model ( ® u se j -functions) Model:. Introduction.
E N D
Sparse, Flexible and Efficient Modeling using L1-Regularization Saharon Rosset and Ji Zhu
Contents • Idea • Algorithm • Results
Introduction Setting: • Implicit dependency on training data • Linear model (® use j-functions) • Model:
Introduction Problem: How to choose weight l of regularization? Answer:Find for all [0, ) • Can this be done efficiently (time, memory)? • Yes, if we impose restrictions on
Restrictions shall be piecewise linear • What impact on L(w) and J(w)? • Can we still solve real world problems?
Restrictions must be piecewise constant • L(w) quadratic in w • J(w) linear in w
Quadratic Loss Functions • square loss in regression • hinge loss for classification (®SVM)
Linear Penalty Functions • Sparseness property
Bet on Sparseness • 50 samples with 300 independent Gaussian variables • Row: 3 non-zero variables • Row: 30 non-zero variables • Row: 300 non-zero variables
„Linear Toolbox“ a(r), b(r) and c(r) piecewise constant coefficients Regression Classification
Algorithm Initialization • start at t=0 ® w=0 • determine set of non-zerocomponents • starting direction
Algorithm Loop follow the direction until one of the following happens: • addition of new component • vanishing of a non-zero component • hit of a “knot” (discontinuity of a(r), b(r), c(r) )
Algorithm Loop • direction update • stopping criterion
NIPS Results General procedure • pre-selection(univariate t-statistic) • Algorithm loss function:Huberized hinge loss • Find best * basedon validation dataset
NIPS Results Dexter Dataset • m=300, n=20'000, pre-selection: n=1152 • linear pieces of : 452 • Optimum at (® 120 non-zero components)
NIPS Results Not very happy with the results® working with the original variables® simple linear model® L1 regularization for feature selection
Conclusion • theory « practice • limited to linear classifier • other extensionsRegularization Path for the SVM (L2)