1 / 39

A multilevel iterated-shrinkage approach to l 1 penalized least-squares

A multilevel iterated-shrinkage approach to l 1 penalized least-squares. Eran Treister and Irad Yavneh Computer Science, Technion (with thanks to Michael Elad). Part I: Background Sparse Representation of Signals Applications Problem Definition Existing Methods. Example: Image Denoising.

lel
Download Presentation

A multilevel iterated-shrinkage approach to l 1 penalized least-squares

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A multilevel iterated-shrinkage approach to l1penalized least-squares Eran Treisterand Irad Yavneh Computer Science, Technion (with thanks to Michael Elad)

  2. Part I:BackgroundSparse Representation of SignalsApplicationsProblem DefinitionExisting Methods

  3. Example: Image Denoising y f v + = Signal Additive Noise Noisy Signal Denoising

  4. Example: Image Denoising y f v + = • Many denoisingalgorithms minimize: relation to “prior” or measurement regularization

  5. Sparse Representation Modeling x = f A • The signal f – represented by only a few columns of A. • The matrix A is redundant (# columns > # rows). Signal Sparse Representation Dictionary (matrix)

  6. Sparse Representation Modeling The support – the set of columns that comprise the signal. S = supp{x} = {i : xi ≠ 0}. = A f x xS– Support sub-vector AS- Support sub-matrix

  7. Denoising by sparse representation Reconstruct clean signal ffrom noisy y + x v y A f Noisy Signal Image Additive Noise

  8. Applications • De-noising. • De-blurring. • In-painting. • De-mosaicing. • ComputedTomography. • Image scale-up & super-resolution • And more…

  9. Formulation 1 • The straightforward way to formulate sparse representation: By constrained minimization • The problem is not convex and may have many local minima. • Solution approximated by “greedy algorithms”.

  10. Formulation 2 - Basis Pursuit • A relaxation of the previous problem: • ||x||1 – l1norm;minimizer x̂ is typically sparse. • The problem is convex; has a convex set of “equivalent” solutions.

  11. Alternative Formulation 2l1 penalized least-squares • F (x) – convex. • Bigger μ→ sparser minimizer. • Gradient is discontinuous for xi = 0. • General purpose optimization tools struggle.

  12. Iterated Shrinkage Methods • Bound-Optimization and EM [Figueiredo & Nowak, `03]. • Surrogate-Separable-Function (SSF) [Daubechies et al., `04]. • Parallel-Coordinate-Descent (PCD) [Elad `05], [Matalon, et.al. `06]. • IRLS-based algorithm [Adeyemi & Davies, `06]. • Gradient Projection Sparse Reconstruction (GPSR)[M. Figueiredo et.al. `07]. • Sparse Reconstruction by separable approx.(SpaRSA) [Wright et al. `09]

  13. Iterated Shrinkage • Coordinate Descent (CD) • Updates each scalar variable in turn so as to minimize the objective. • Parallel Coordinate Descent (PCD) • Applies the CD update simultaneously to all variables. • Based on the projection of the residual: AT(Ax-y). • PCD + (non-linear) Conjugate Gradient (CG-PCD) [Zibulevski, Elad `10]. • Uses two consecutive PCD steps to calculate the next one.

  14. Part II:A Multilevel Algorithm forSparse Representation

  15. The main idea: • Use existing iterated shrinkage methods (as “relaxations”). • Improve the current approximation by using a reduced (lower-level) dictionary.

  16. The main idea:Reducing the dimension of A • The solution is sparse – most columns will not end up in the support! • At each stage: • Many columns are highly unlikely to contribute to the minimizer. • Such columns can be temporarily dropped – resulting in a smaller problem.

  17. Reducing the dimension of A C : lower-level subset of columns

  18. Reducing the problem • Fine level problem: • Assume we have a prolongation P that satisfies substituting for x:

  19. Reducing the problem • By our simple choice of P, • Also, we choose {x}, so. • Then, by setting Ac = AP , we get a reduced problem that is of smaller size (fewer unknowns)!

  20. The choice of C - Likelihood to enter the support • The residual is defined by: • A column is likely to enter the support – • If it has a high inner-product withr (greedy approach). • Likely: Columns that are currently not in the support , which have the largest likelihood.

  21. Lower-level dictionarychoosing mc= m/2 columns

  22. The multilevel cycleRepeated iteratively until convergence

  23. Theoretical Properties • Inter-level correspondence: • Direct-Solution (two-level):

  24. Theoretical Properties • No stagnation (two-level): • Complementary roles:

  25. Theoretical Properties • Monotonicity: • Convergence:Assuming that: Relax(x) reduces F(x) proportionally to the square of its gradient.

  26. Theoretical Properties • C-selection guarantee • Assume columns of A are normalized. • x– current approximation. x̂ - solution. • C - chosen using x, |C|>|supp{x}|.

  27. Initialization • When starting with zero initial guess, relaxations tend to initially generate supports that are too rich. • V-cycle efficiency might be hampered. • We adopt a “Full multigrid (FMG) algorithm”:

  28. Numerical Results Synthetic denoising experiment • Experiments with various dictionaries Anxm. • n=1024, m=4096. • Initial Support S – randomly chosen of size 0.1n. • xS– random vector ~N(0,I). • f = ASxS. • Addition of noise: v~N(0,σ2I). σ= 0.02 • y= f + v

  29. Numerical Results • Stopping criterion [Loris 2009]: • One level methods: • CD+- CD + linesearch. • PCD, CG – non-linear CG with PCD [Zibulevsky& Elad2010]. • SpaRSA[Wright et al. `09]. • ML- multilevel method. • ML-CD - multilevel framework with CD+ as shrinkage iteration. • ML-CG - multilevel framework with CG as shrinkage iteration.

  30. Experiment 1: Random Normal • A – random dense nxm matrix. Ai,j~N(0,1).

  31. Experiment 2: Random ± 1

  32. Experiment 3: ill-conditioned • A – random dense nxm matrix. Ai,j~N(0,1). • Singular values manipulated so that A becomes ill-conditioned [Loris 2009, Zibulevsky & Elad 2010].

  33. Experiment 3: ill-conditioned

  34. Experiment 4: Similar columns • A = [B|C]. B – random. C – perturbed rank 1 matrix.

  35. Conclusions & Future work • New multilevel approach developed. • Exploits the sparsityof the solution. • Accelerates existing iterated shrinkage methods. • Future work: • Improvements: • Faster lowest-level solutions. • More suitable iterated shrinkage schemes. • Handling non-sparse solutions (different priors). • A multilevel method for fast-operator dictionaries.

  36. Next step: Covariance Selection Given a few random vectors: we wish to estimate the inverse of the covariance, Σ-1,assuming it is sparse. From probability theory:

  37. Problem formulation • Maximum likelihood (ML) estimation - we maximize • Likeliest mean: • Likeliest covariance:

  38. Problem formulation • Setting the gradient of J to zero yields: • However, K<<n, so X is of low rank. Ill-posed problem. Introducing regularization: • λ > 0: minimizer is sparse and positive definite.

  39. Our direction • Via Newton steps we can get a series of l1 regularized least squares problems. Only a few steps needed. • Current “state of the art”: • CD on Newton steps. Formulating a step: O(n3) • If supp(Ɵ) is restricted to O(m): each Newton problem can be solved in O(mn·#it). • First few steps: O(n3·#it). Last few steps: O(n2·#it). • Our direction: • ML-CD+linesearch: all steps O(n2·#it), with fewer #it. • Hopefully: quasi Newton steps: formulation in O(n2).

More Related