Sparse Coding and Automatic Relevance Determination for Multi-way models

Sparse Coding and Automatic Relevance Determination for Multi-way models Morten Mørup DTU Informatics Intelligent Signal Processing Technical University of Denmark Joint work with Lars Kai Hansen DTU Informatics Intelligent Signal Processing Technical University of Denmark Sparse’09 8 April 2009

Multi-way modeling has become an important tool for Unsupervised Learning and are in frequent use today in a variety of fields including Vector: 1-way array/ 1st order tensor, Matrix: 2-way array/ 2nd order tensor, 3-way array/3rd order tensor • Psychometrics (Subject x Task x Time) • Chemometrics (Sample x Emission x Absorption) • NeuroImaging (Channel x Time x Trial) • Textmining (Term x Document x Time/User) • Signal Processing (ICA, i.e. diagonalization of Cummulants) Sparse’09 8 April 2009

I x JK Matrix J x KI Matrix K x IJ Matrix Some Tensor algebra XI x J x K • Matricizing X  X(n) • N-mode multiplication i.e. for matrix multiplication CA=A x1C , ABT=A x2B X(1) X(2) X(3) Sparse’09 8 April 2009

Tensor Decomposition The two most commonly used tensor decomposition models are the CandeComp/PARAFAC (CP) model and the Tucker model Sparse’09 8 April 2009

CP-model unique • Tucker model not unique • What constitutes an adequate number of components, i.e. determining J for CP and J1,J2,…,JN for Tucker is an open problem, particularly difficult for the Tucker model as the number of components are specified for each modality separately Sparse’09 8 April 2009

Agenda • To use sparse coding to Simplify the Tucker core forming a unique representation as well as enable interpolation between the Tucker (full core) and CP (diagonal core) model • To use sparse coding to turn off excess components in the CP and Tucker model and thereby select the model order. • To tune the pruning strength from data by Automatic Relevance Determination (ARD). Sparse’09 8 April 2009

Nature, 1996 Sparse Coding Bruno A. Olshausen David J. Field Preserve Information Preserve Sparsity (Simplicity) Tradeoff parameter Sparse’09 8 April 2009

Sparse Coding (cont.) Image patch 1 to N … Patch image Vectorize patches  A corresponds to Gaborlike features resemblingsimple cell behavior inV1 of the brain! Sparse coding attempts to both learn dictionary and encoding at the same time (This problem is not convex!) Sparse’09 8 April 2009

Automatic Relevance Determination (ARD) • Automatic Relevance Determination (ARD) is a hierarchical Bayesian approach widely used for model selection • In ARD hyper-parameters explicitly represents the relevance of different features by defining the range of variation. (i.e., Range of variation0  Feature removed) Sparse’09 8 April 2009

A Bayesian formulation of the Lasso / BPD problem Sparse’09 8 April 2009

ARD in reality a L0-norm optimization scheme. As such ARD based on Laplace prior corresponds to L0-norm optimization by re-weighted L1-norm L0 norm by re-weighted L2 follows by imposing Gaussian prior instead of Laplace Sparse’09 8 April 2009

Sparse Tucker decomposition by ARD Brakes into standard Lasso/BPD sub-problems of the form Sparse’09 8 April 2009

Solving efficiently the Lasso/BPD sub-problems Notice, in general the alternating sub-problems for Tucker estimation have J<I! Sparse’09 8 April 2009

The Tucker ARD Algorithm Sparse’09 8 April 2009

Results Fluorescene data: emission x excitation x samples Tucker(10,10,10) models were fitted to the data, given are below the extracted cores Synthetic Tucker(3,4,5) Tucker(3,6,4) Tucker(?,4,4) Tucker(4,4,4) Tucker(3,3,3) Sparse’09 8 April 2009

Sparse Coding in fact 2-way case of Tucker model • The presented ARD framework extracts a 20 component model when trained on the natural image data of (Olshausen and Field, Nature 1996) Image patch 1 to N … Patch image Vectorize patches  Sparse’09 8 April 2009

Conclusion • Imposing sparseness on the core enable to interpolate between Tucker and CP model • ARD based on MAP estimation form a simple framework to tune the pruning in sparse coding models giving the model order. • ARD framework especially useful for the Tucker model where the order is specified for each mode separately which makes exhaustive evaluation of all potential models expensive. • ARD-estimation based on MAP is closely related to L0 norm estimation based on reweighted L1 and L2 norm. Thus, ARD form a principled framework for learning the sparsity pattern in CS. Sparse’09 8 April 2009

http://mmds.imm.dtu.dk Sparse’09 8 April 2009

Sparse Coding and Automatic Relevance Determination for Multi-way models

Sparse Coding and Automatic Relevance Determination for Multi-way models

Presentation Transcript

Group Sparse Coding

Image classification by sparse coding

Predictive Automatic Relevance Determination by Expectation Propagation

Automatic Determination of May

Automatic Image Annotation and Retrieval using Cross-Media Relevance Models

Sparse Coding and the Homotopy Algorithm

Sparse Coding for Image and Video Understanding

Sparse Coding for Specification Mining and Error Localization

Hearing loss and sparse coding

Differentiable Sparse Coding

Sparse Coding and Its Extensions for Visual Recognition

Relevance Models [draft]

Sparse Coding in Sparse Winner networks

Performance Models for Evaluation and Automatic Tuning of Symmetric Sparse Matrix-Vector Multiply

ASP.NET coding models

Relevance Models and Answer Granularity for Question Answering

Automatic Image Annotation and Retreval using Cross-Media Relevance Models

Predictive Automatic Relevance Determination by Expectation Propagation

Relevance-Based Language Models

Configuring Automatic Account Determination

Accounting Determination and Automatic Postings

Online Learning for Matrix Factorization and Sparse Coding