160 likes | 460 Views
Multi-label Prediction via Sparse Infinite CCA. Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010. Note: all tables and figures are taken from the original paper. Outline. Canonical Correlation Analysis CCA Probabilistic CCA
E N D
Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note: all tables and figures are taken from the original paper
Outline • Canonical Correlation Analysis • CCA • Probabilistic CCA • Infinite Canonical Correlation Analysis Model • The Indian Buffet Process • The Infinite CCA Model • Inference • Multitask Learning using Infinite CCA • Fully supervised setting • Semi-supervised setting • Experiments • Infinite CCA results on synthetic data • Infinite CCA applied to multi-label prediction • Conclusion
Canonical Correlation Analysis • For variables , CCA seeks the linear projections so that the variables are maximally correlated in the projection space. • Correlation coefficient between two variables in the embedded space is given by • CCA can be posed as a constrained optimization problem • Let denote the covariance matrix of data samples and
Probabilistic CCA Let , , consider following latent variable model: We can also write where Latent variable z is shared between x and y
Probabilistic CCA • Probabilistic interpretation of CCA • Maximum likelihood approach for parameter estimation • Number of canonical correlation components is fixed • The projection matrix is not sparse • Use IBP as a prior on the binary matrices with infinitely countable columns • Posterior inference determines the subset of latent features for the responsible observations • The IBP ensures that the matrices are sparse
Indian Buffet Process • Given an matrix of observations each with features, the latent feature model can be expressed as where • IBP interpretation: • First customer tries dishes • nth customer tries: • Previously-tasted dish k with probability • completely new dishes
Infinite CCA Model • Impose IBP prior on matrix so that the dimensionality of the latent space associated with can be automatically determined from an unbounded number. • Represent the where • Two random vectors x and y can be modeled as • z is shared between x and y, and are noise.
Infinite CCA Model Let The full model can be written as The graphical model structure
Inference • Sample B • Sample existing dishes: • Sample new dishes: use an M-H step Propose Accept the proposal with an acceptance probability where
Inference • Sample V • Sample Z
Multitask Learning using Infinite CCA Consider each example is associated with multiple labels. One task: to predict each label. Motivation: borrow information across tasks. • Apply infinite CCA model to capture label correlations • Learn better predictive features by projecting the data to a subspace directed by label information cross-covariance matrix : input-output correlation label covariance matrix : label correlation
Multitask Learning using Infinite CCA • Fully supervised setting (Model-1) Givenlabeled data , the model is to learn task parameters in the subspace. Predict labels in the original D dimensional space by inflating parameters back to D dimensions with the projection matrix. • Semi-supervised setting (Model-2) Learn the embeddings for both training and testing data, thus training and testing both take place in the K dimensionality subspace.
Experiments (I) • Generate two datasets of dimensionalities 25 and 10, each having 100 samples; • Ground truth: have 4 correlation components with a 63% sparsity in the true projection matrix • Classical CCA found 8 components with significant correlations; while infinite CCA correctly discovered exactly 4 components. • Classical CCA infer the projection matrix with no exact zero entries. If set small values to be zero, the uncovered sparsity was about 25%; • Infinite CCA can infer the projection matrix with 57% zero entries and 62% zero entries after thresholding very small values.
Experiments (II) • Use two real-world multi-label datasets (Yeast and Scene) from UCI repository; • The Yeast dataset consists of 1500 training and 917 testing examples, each having 103 features. The number of labels per example is 14. • The Scene dataset consists of 1211 training and 1196 testing examples, each having 294 features. The number of labels per example is 6. • Compare the following models
Conclusion • Present a nonparametric Bayesian model for the CCA problem; • Automatically select the number of correlation components and capture the sparsity pattern; • Enable to deal with missing data; • Solve the multi-label learning problem .