A multilevel iterated-shrinkage approach to l 1 penalized least-squares

A multilevel iterated-shrinkage approach to l1penalized least-squares Eran Treisterand Irad Yavneh Computer Science, Technion (with thanks to Michael Elad)

Part I:BackgroundSparse Representation of SignalsApplicationsProblem DefinitionExisting Methods

Example: Image Denoising y f v + = Signal Additive Noise Noisy Signal Denoising

Example: Image Denoising y f v + = • Many denoisingalgorithms minimize: relation to “prior” or measurement regularization

Sparse Representation Modeling x = f A • The signal f – represented by only a few columns of A. • The matrix A is redundant (# columns > # rows). Signal Sparse Representation Dictionary (matrix)

Sparse Representation Modeling The support – the set of columns that comprise the signal. S = supp{x} = {i : xi ≠ 0}. = A f x xS– Support sub-vector AS- Support sub-matrix

Denoising by sparse representation Reconstruct clean signal ffrom noisy y + x v y A f Noisy Signal Image Additive Noise

Applications • De-noising. • De-blurring. • In-painting. • De-mosaicing. • ComputedTomography. • Image scale-up & super-resolution • And more…

Formulation 1 • The straightforward way to formulate sparse representation: By constrained minimization • The problem is not convex and may have many local minima. • Solution approximated by “greedy algorithms”.

Formulation 2 - Basis Pursuit • A relaxation of the previous problem: • ||x||1 – l1norm;minimizer x̂ is typically sparse. • The problem is convex; has a convex set of “equivalent” solutions.

Alternative Formulation 2l1 penalized least-squares • F (x) – convex. • Bigger μ→ sparser minimizer. • Gradient is discontinuous for xi = 0. • General purpose optimization tools struggle.

Iterated Shrinkage Methods • Bound-Optimization and EM [Figueiredo & Nowak, `03]. • Surrogate-Separable-Function (SSF) [Daubechies et al., `04]. • Parallel-Coordinate-Descent (PCD) [Elad `05], [Matalon, et.al. `06]. • IRLS-based algorithm [Adeyemi & Davies, `06]. • Gradient Projection Sparse Reconstruction (GPSR)[M. Figueiredo et.al. `07]. • Sparse Reconstruction by separable approx.(SpaRSA) [Wright et al. `09]

Iterated Shrinkage • Coordinate Descent (CD) • Updates each scalar variable in turn so as to minimize the objective. • Parallel Coordinate Descent (PCD) • Applies the CD update simultaneously to all variables. • Based on the projection of the residual: AT(Ax-y). • PCD + (non-linear) Conjugate Gradient (CG-PCD) [Zibulevski, Elad `10]. • Uses two consecutive PCD steps to calculate the next one.

Part II:A Multilevel Algorithm forSparse Representation

The main idea: • Use existing iterated shrinkage methods (as “relaxations”). • Improve the current approximation by using a reduced (lower-level) dictionary.

The main idea:Reducing the dimension of A • The solution is sparse – most columns will not end up in the support! • At each stage: • Many columns are highly unlikely to contribute to the minimizer. • Such columns can be temporarily dropped – resulting in a smaller problem.

Reducing the dimension of A C : lower-level subset of columns

Reducing the problem • Fine level problem: • Assume we have a prolongation P that satisfies substituting for x:

Reducing the problem • By our simple choice of P, • Also, we choose {x}, so. • Then, by setting Ac = AP , we get a reduced problem that is of smaller size (fewer unknowns)!

The choice of C - Likelihood to enter the support • The residual is defined by: • A column is likely to enter the support – • If it has a high inner-product withr (greedy approach). • Likely: Columns that are currently not in the support , which have the largest likelihood.

Lower-level dictionarychoosing mc= m/2 columns

The multilevel cycleRepeated iteratively until convergence

Theoretical Properties • Inter-level correspondence: • Direct-Solution (two-level):

Theoretical Properties • No stagnation (two-level): • Complementary roles:

Theoretical Properties • Monotonicity: • Convergence:Assuming that: Relax(x) reduces F(x) proportionally to the square of its gradient.

Theoretical Properties • C-selection guarantee • Assume columns of A are normalized. • x– current approximation. x̂ - solution. • C - chosen using x, |C|>|supp{x}|.

Initialization • When starting with zero initial guess, relaxations tend to initially generate supports that are too rich. • V-cycle efficiency might be hampered. • We adopt a “Full multigrid (FMG) algorithm”:

Numerical Results Synthetic denoising experiment • Experiments with various dictionaries Anxm. • n=1024, m=4096. • Initial Support S – randomly chosen of size 0.1n. • xS– random vector ~N(0,I). • f = ASxS. • Addition of noise: v~N(0,σ2I). σ= 0.02 • y= f + v

Numerical Results • Stopping criterion [Loris 2009]: • One level methods: • CD+- CD + linesearch. • PCD, CG – non-linear CG with PCD [Zibulevsky& Elad2010]. • SpaRSA[Wright et al. `09]. • ML- multilevel method. • ML-CD - multilevel framework with CD+ as shrinkage iteration. • ML-CG - multilevel framework with CG as shrinkage iteration.

Experiment 1: Random Normal • A – random dense nxm matrix. Ai,j~N(0,1).

Experiment 2: Random ± 1

Experiment 3: ill-conditioned • A – random dense nxm matrix. Ai,j~N(0,1). • Singular values manipulated so that A becomes ill-conditioned [Loris 2009, Zibulevsky & Elad 2010].

Experiment 3: ill-conditioned

Experiment 4: Similar columns • A = [B|C]. B – random. C – perturbed rank 1 matrix.

Conclusions & Future work • New multilevel approach developed. • Exploits the sparsityof the solution. • Accelerates existing iterated shrinkage methods. • Future work: • Improvements: • Faster lowest-level solutions. • More suitable iterated shrinkage schemes. • Handling non-sparse solutions (different priors). • A multilevel method for fast-operator dictionaries.

Next step: Covariance Selection Given a few random vectors: we wish to estimate the inverse of the covariance, Σ-1,assuming it is sparse. From probability theory:

Problem formulation • Maximum likelihood (ML) estimation - we maximize • Likeliest mean: • Likeliest covariance:

Problem formulation • Setting the gradient of J to zero yields: • However, K<<n, so X is of low rank. Ill-posed problem. Introducing regularization: • λ > 0: minimizer is sparse and positive definite.

Our direction • Via Newton steps we can get a series of l1 regularized least squares problems. Only a few steps needed. • Current “state of the art”: • CD on Newton steps. Formulating a step: O(n3) • If supp(Ɵ) is restricted to O(m): each Newton problem can be solved in O(mn·#it). • First few steps: O(n3·#it). Last few steps: O(n2·#it). • Our direction: • ML-CD+linesearch: all steps O(n2·#it), with fewer #it. • Hopefully: quasi Newton steps: formulation in O(n2).

A multilevel iterated-shrinkage approach to l 1 penalized least-squares

A multilevel iterated-shrinkage approach to l 1 penalized least-squares

Presentation Transcript

Partial Least Squares

Least Squares example

Ordinary Least-Squares

Least Squares Regression

Least-squares Meshes

Ordinary least Squares

Least-Squares Regression

Least Squares

Linear Least Squares

Least Squares Asymptotics

Nonlinear least squares

Least-Squares Regression

Least Squares Regression

Least squares

Least Squares

LEAST SQUARES DATUMING

Method of Least Squares (Least Squares Regression):

Least squares method

Least Squares Migration

Linear Least Squares

Least-squares Meshes

Linear Least Squares