1 / 21

Sparse, Flexible and Efficient Modeling using L 1 -Regularization

Sparse, Flexible and Efficient Modeling using L 1 -Regularization. Saharon Rosset and Ji Zhu. Contents. Idea Algorithm Results. Part 1: Idea. Introduction. Setting: Implicit dependency on training data Linear model ( ® u se j -functions) Model:. Introduction.

guri
Download Presentation

Sparse, Flexible and Efficient Modeling using L 1 -Regularization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sparse, Flexible and Efficient Modeling using L1-Regularization Saharon Rosset and Ji Zhu

  2. Contents • Idea • Algorithm • Results

  3. Part 1: Idea

  4. Introduction Setting: • Implicit dependency on training data • Linear model (® use j-functions) • Model:

  5. Introduction Problem: How to choose weight l of regularization? Answer:Find for all  [0, ) • Can this be done efficiently (time, memory)? • Yes, if we impose restrictions on

  6. Restrictions shall be piecewise linear • What impact on L(w) and J(w)? • Can we still solve real world problems?

  7. Restrictions must be piecewise constant • L(w) quadratic in w • J(w) linear in w

  8. Quadratic Loss Functions • square loss in regression • hinge loss for classification (®SVM)

  9. Linear Penalty Functions • Sparseness property

  10. Bet on Sparseness • 50 samples with 300 independent Gaussian variables • Row: 3 non-zero variables • Row: 30 non-zero variables • Row: 300 non-zero variables

  11. Part 2: Algorithm

  12. „Linear Toolbox“ a(r), b(r) and c(r) piecewise constant coefficients Regression Classification

  13. Optimization Problem

  14. Algorithm Initialization • start at t=0 ® w=0 • determine set of non-zerocomponents • starting direction

  15. Algorithm Loop follow the direction until one of the following happens: • addition of new component • vanishing of a non-zero component • hit of a “knot” (discontinuity of a(r), b(r), c(r) )

  16. Algorithm Loop • direction update • stopping criterion

  17. Part 3: Results

  18. NIPS Results General procedure • pre-selection(univariate t-statistic) • Algorithm loss function:Huberized hinge loss • Find best * basedon validation dataset

  19. NIPS Results Dexter Dataset • m=300, n=20'000, pre-selection: n=1152 • linear pieces of : 452 • Optimum at (® 120 non-zero components)

  20. NIPS Results Not very happy with the results® working with the original variables® simple linear model® L1 regularization for feature selection

  21. Conclusion • theory « practice • limited to linear classifier • other extensionsRegularization Path for the SVM (L2)

More Related