1 / 16

Multi-label Prediction via Sparse Infinite CCA

Multi-label Prediction via Sparse Infinite CCA. Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010. Note: all tables and figures are taken from the original paper. Outline. Canonical Correlation Analysis CCA Probabilistic CCA

lynton
Download Presentation

Multi-label Prediction via Sparse Infinite CCA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note: all tables and figures are taken from the original paper

  2. Outline • Canonical Correlation Analysis • CCA • Probabilistic CCA • Infinite Canonical Correlation Analysis Model • The Indian Buffet Process • The Infinite CCA Model • Inference • Multitask Learning using Infinite CCA • Fully supervised setting • Semi-supervised setting • Experiments • Infinite CCA results on synthetic data • Infinite CCA applied to multi-label prediction • Conclusion

  3. Canonical Correlation Analysis • For variables , CCA seeks the linear projections so that the variables are maximally correlated in the projection space. • Correlation coefficient between two variables in the embedded space is given by • CCA can be posed as a constrained optimization problem • Let denote the covariance matrix of data samples and

  4. Probabilistic CCA Let , , consider following latent variable model: We can also write where Latent variable z is shared between x and y

  5. Probabilistic CCA • Probabilistic interpretation of CCA • Maximum likelihood approach for parameter estimation • Number of canonical correlation components is fixed • The projection matrix is not sparse • Use IBP as a prior on the binary matrices with infinitely countable columns • Posterior inference determines the subset of latent features for the responsible observations • The IBP ensures that the matrices are sparse

  6. Indian Buffet Process • Given an matrix of observations each with features, the latent feature model can be expressed as where • IBP interpretation: • First customer tries dishes • nth customer tries: • Previously-tasted dish k with probability • completely new dishes

  7. Infinite CCA Model • Impose IBP prior on matrix so that the dimensionality of the latent space associated with can be automatically determined from an unbounded number. • Represent the where • Two random vectors x and y can be modeled as • z is shared between x and y, and are noise.

  8. Infinite CCA Model Let The full model can be written as The graphical model structure

  9. Inference • Sample B • Sample existing dishes: • Sample new dishes: use an M-H step Propose Accept the proposal with an acceptance probability where

  10. Inference • Sample V • Sample Z

  11. Multitask Learning using Infinite CCA Consider each example is associated with multiple labels. One task: to predict each label. Motivation: borrow information across tasks. • Apply infinite CCA model to capture label correlations • Learn better predictive features by projecting the data to a subspace directed by label information cross-covariance matrix : input-output correlation label covariance matrix : label correlation

  12. Multitask Learning using Infinite CCA • Fully supervised setting (Model-1) Givenlabeled data , the model is to learn task parameters in the subspace. Predict labels in the original D dimensional space by inflating parameters back to D dimensions with the projection matrix. • Semi-supervised setting (Model-2) Learn the embeddings for both training and testing data, thus training and testing both take place in the K dimensionality subspace.

  13. Experiments (I) • Generate two datasets of dimensionalities 25 and 10, each having 100 samples; • Ground truth: have 4 correlation components with a 63% sparsity in the true projection matrix • Classical CCA found 8 components with significant correlations; while infinite CCA correctly discovered exactly 4 components. • Classical CCA infer the projection matrix with no exact zero entries. If set small values to be zero, the uncovered sparsity was about 25%; • Infinite CCA can infer the projection matrix with 57% zero entries and 62% zero entries after thresholding very small values.

  14. Experiments (II) • Use two real-world multi-label datasets (Yeast and Scene) from UCI repository; • The Yeast dataset consists of 1500 training and 917 testing examples, each having 103 features. The number of labels per example is 14. • The Scene dataset consists of 1211 training and 1196 testing examples, each having 294 features. The number of labels per example is 6. • Compare the following models

  15. Conclusion • Present a nonparametric Bayesian model for the CCA problem; • Automatically select the number of correlation components and capture the sparsity pattern; • Enable to deal with missing data; • Solve the multi-label learning problem .

More Related