Learning Measurement Matrices for Redundant Dictionaries

Learning Measurement Matrices for Redundant Dictionaries Richard Baraniuk Rice University ChinmayHegde MIT AswinSankaranarayanan CMU

Sparse Recovery • Sparsity rocks, etc. • Previous talk focused mainly on signal inference (ex: classification, NN search) • This talk focuses on signal recovery

Compressive Sensing Sensing via randomized dimensionality reduction sparsesignal random measurements nonzero entries • Recovery: solve an ill-posed inverse problem exploit the geometrical structure of sparse/compressible signals

General Sparsifying Bases • Gaussian measurements incoherent with any fixed orthonormal basis (with high probability) • Ex: frequency domain:

Sparse Modeling: Approach 1 • Step 1: Choose a signal model with structure • e.g. bandlimited, smooth with r vanishing moments, etc. • Step 2: Analytically design a sparsifying basis/frame that exploits this structure • e.g. DCT, wavelets, Gabor, etc. DCT Wavelets Gabor ? ?

Sparse Modeling: Approach 2 • Learn the sparsifying basis/frame from training data • Problem formulation: given a large number of training signals, design a dictionaryD that simultaneouslysparsifies the training data • Called sparse coding / dictionary learning

Dictionaries • Dictionary: an NxQ matrix whose columns are used as basis functions for the data • Convention: assume columns are unit-norm • More columns than rows, so dictionary is redundant / overcomplete

Dictionary Learning • Rich veinof theoretical and algorithmic work Olshausen and Field [‘97], Lewicki and Sejnowski [’00], Elad [‘06], Sapiro [‘08] • Typical formulation: Given training data Solve: • Several efficient algorithms, ex: K-SVD

Dictionary Learning • Successfully applied to denoising, deblurring, inpainting, demosaicking, super-resolution, … • State-of-the-art results in many of these problems Aharon and Elad ‘06

Dictionary Coherence • Suppose that the learned dictionary is normalized to have unit -norm columns: • The mutual coherence of D is defined as • Geometrically, represents the cosine of the minimum angle between the columns of D, smalleris better • Crucial parameter in analysis as well as practice (line of work starting with Tropp [04])

Dictionaries and CS • Can extend CS to work withnon-orthonormal, redundant dictionaries • Coherence of determines recovery success Rauhut et al. [08], Candes et al. [10] • Fortunately, randomguarantees low coherence Holographic basis

Geometric Intuition • Columns of D: points on the unit sphere • Coherence: minimum angle between the vectors • J-L Lemma: Random projections approximately preserve angles between vectors

Q: Can we do better than random projections for dictionary-based CS?Q restated: For a givendictionary D, find the best CS measurement matrix

Optimization Approach • Assume that a good dictionary D has been provided. • Goal: Learn the best for this particular D • As before, want the “shortest” matrix such that the coherence of is at most some parameter • To avoid degeneracies caused by a simple scaling, also want that does not shrink columns much:

A NuMax-like Framework • Convert quadratic constraints in into linear constraints in(via the “lifting trick”) • Use a nuclear-norm relaxation of the rank • Simplifiedproblem:

Algorithm: “NuMax-Dict” • Alternating Direction Method of Multipliers (ADMM) • - solve for P using spectral thresholding • - solve for L using least-squares • - solve for q using “squishing” • Convergence rate depends on the size of thedictionary(since #constraints = ) [HSYB12]

NuMax vs. NuMax-Dict • Same intuition, trick, algorithm, etc; • Key enabler is that coherence is intrinsically a quadratic function of the data • Key difference: the (linearized) constraints are no longer symmetric • We have constraints of the form • This might result in intermediate P estimates having complex eigenvalues, so the notion of spectral thresholding needs to be slightly modified

Experimental Results

Expt1: Synthetic Dictionary • Generic dictionary: random w/ unit norm. columns • Dictionary size: 64x128 • Weconstruct different measurement matrices: • Random • NuMax-Dict • Algorithm by Elad [06] • Algorithm by Duarte-Carvajalino & Sapiro [08] • We generate K=3 sparse signals with Gaussian amplitudes, add 30dB measurement noise • Recovery using OMP • Measure recovery SNR, plot as a function of M

Exp 1: Synthetic Dictionary

Expt2: Practical Dictionaries • 2x overcomplete DCT dictionary, same parameters • 2x overcomplete dictionary learned on 8x8 patches of a real-world image (Barbara) using K-SVD • Recovery using OMP

Analysis • Exact problem seems to be hard to analyze • But, as in NuMax, can provide analytical bounds in the special case where the measurement matrix is further constrained to be orthonormal

Orthogonal Sensing of Dictionary-Sparse Signals • Given a dictionary D, find the orthonormal measurement matrix that provides the best possible coherence • From a geometric perspective, ortho-projections cannot improve coherence, so necessarily

Semidefinite Relaxation • The usual trick: Lifting and trace-norm relaxation

Theoretical Result • Theorem: For any given redundant dictionary D, denote its mutual coherence by . Denote the optimum of the (nonconvex) problem as Then,there exists a method to produce a rank-2M ortho matrix such that the coherence of is at most i.e., We can obtain close tooptimal performance, but pay a price of a factor 2 in the number of measurements

Conclusions • NuMax-Dictperformance comparable to the best existing algorithms • Principled convex optimization framework • Efficient ADMM-type algorithm that exploits the rank-1 structure of the problem • Upshot: possible to incorporate other structure into the measurement matrix, such as positivity, sparsity, etc.

Open Question • Above framework assumes a two-step approach: first construct a redundant dictionary (analytically or from data) and then construct a measurement matrix • Given a large number of training data, how to efficiently solve jointly for both the dictionary and the sensing matrix? (Approach introduced in DC-Sapiro [08])

Learning Measurement Matrices for Redundant Dictionaries

Learning Measurement Matrices for Redundant Dictionaries

Presentation Transcript

Dictionaries

Learning Non-Redundant Codebooks for Classifying Complex Objects

Dictionaries

Dictionaries

Dictionaries

Dictionaries

Dictionaries

Dictionaries

Dictionaries

Dictionaries

Dictionaries

Dictionaries

Dictionaries

Dictionaries

DEEP DIVE: Measurement for Learning

Dictionaries

DICTIONARIES

Dictionaries

Dictionaries

SRINKAGE FOR REDUNDANT REPRESENTATIONS ?