Multi-label Prediction via Sparse Infinite CCA

Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note: all tables and figures are taken from the original paper

Outline • Canonical Correlation Analysis • CCA • Probabilistic CCA • Infinite Canonical Correlation Analysis Model • The Indian Buffet Process • The Infinite CCA Model • Inference • Multitask Learning using Infinite CCA • Fully supervised setting • Semi-supervised setting • Experiments • Infinite CCA results on synthetic data • Infinite CCA applied to multi-label prediction • Conclusion

Canonical Correlation Analysis • For variables , CCA seeks the linear projections so that the variables are maximally correlated in the projection space. • Correlation coefficient between two variables in the embedded space is given by • CCA can be posed as a constrained optimization problem • Let denote the covariance matrix of data samples and

Probabilistic CCA Let , , consider following latent variable model: We can also write where Latent variable z is shared between x and y

Probabilistic CCA • Probabilistic interpretation of CCA • Maximum likelihood approach for parameter estimation • Number of canonical correlation components is fixed • The projection matrix is not sparse • Use IBP as a prior on the binary matrices with infinitely countable columns • Posterior inference determines the subset of latent features for the responsible observations • The IBP ensures that the matrices are sparse

Indian Buffet Process • Given an matrix of observations each with features, the latent feature model can be expressed as where • IBP interpretation: • First customer tries dishes • nth customer tries: • Previously-tasted dish k with probability • completely new dishes

Infinite CCA Model • Impose IBP prior on matrix so that the dimensionality of the latent space associated with can be automatically determined from an unbounded number. • Represent the where • Two random vectors x and y can be modeled as • z is shared between x and y, and are noise.

Infinite CCA Model Let The full model can be written as The graphical model structure

Inference • Sample B • Sample existing dishes: • Sample new dishes: use an M-H step Propose Accept the proposal with an acceptance probability where

Inference • Sample V • Sample Z

Multitask Learning using Infinite CCA Consider each example is associated with multiple labels. One task: to predict each label. Motivation: borrow information across tasks. • Apply infinite CCA model to capture label correlations • Learn better predictive features by projecting the data to a subspace directed by label information cross-covariance matrix : input-output correlation label covariance matrix : label correlation

Multitask Learning using Infinite CCA • Fully supervised setting (Model-1) Givenlabeled data , the model is to learn task parameters in the subspace. Predict labels in the original D dimensional space by inflating parameters back to D dimensions with the projection matrix. • Semi-supervised setting (Model-2) Learn the embeddings for both training and testing data, thus training and testing both take place in the K dimensionality subspace.

Experiments (I) • Generate two datasets of dimensionalities 25 and 10, each having 100 samples; • Ground truth: have 4 correlation components with a 63% sparsity in the true projection matrix • Classical CCA found 8 components with significant correlations; while infinite CCA correctly discovered exactly 4 components. • Classical CCA infer the projection matrix with no exact zero entries. If set small values to be zero, the uncovered sparsity was about 25%; • Infinite CCA can infer the projection matrix with 57% zero entries and 62% zero entries after thresholding very small values.

Experiments (II) • Use two real-world multi-label datasets (Yeast and Scene) from UCI repository; • The Yeast dataset consists of 1500 training and 917 testing examples, each having 103 features. The number of labels per example is 14. • The Scene dataset consists of 1211 training and 1196 testing examples, each having 294 features. The number of labels per example is 6. • Compare the following models

Conclusion • Present a nonparametric Bayesian model for the CCA problem; • Automatically select the number of correlation components and capture the sparsity pattern; • Enable to deal with missing data; • Solve the multi-label learning problem .

Multi-label Prediction via Sparse Infinite CCA

Multi-label Prediction via Sparse Infinite CCA

Presentation Transcript

Multi-label Classification without Multi-label Cost - Multi-label Random Decision Tree Classifier

Image Super-resolution via Sparse Representation

Multi-protocol Label Switching

Multi Protocol Label Switching

Scalable Multi-Label Annotation

Large Scale Multi-Label Classification

Multi-Label Collective Classification

HCP model: Single-label to Multi-label

Multi-Protocol Label Switch (MPLS)

Cognitive Brain Rhythms via Sparse Synchronization

Language Interoperable CCA Components via

Multi-Label Prediction via Compressed Sensing

Correlative Multi-Label Multi-Instance Image Annotation

Multi Protocol Label Switching (MPLS)

Language Interoperable CCA Components via

Face recognition via sparse representation

Multi-Label Collective Classification

Language Interoperable CCA Components via

Multi-Label Prediction via Compressed Sensing

Multi-Protocol Label Switching (MPLS)