160 likes | 310 Views
An Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism. Aaron C. Courville, Douglas Eck and Yoshua Bengio NIPS 2009 Presented by Lingbo Li ECE, Duke University May 21, 2010. Note: all tables and figures taken from the original paper. Outline. Motivations Latent Factor Modeling
E N D
An Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism Aaron C. Courville, Douglas Eck and Yoshua Bengio NIPS 2009 Presented by Lingbo Li ECE, Duke University May 21, 2010 Note: all tables and figures taken from the original paper
Outline • Motivations • Latent Factor Modeling • A Hierarchy of Latent Features Via a Noisy-Or Mechanism • Inference • Experiments • Conclusions
Motivations • Indian Buffet Process (IBP): factorial representation of data. • Music tag data (Last.fm): to organize playlists. e.g. RADIOHEAD: alternative, rock, alternative rock, indie, electronic, britpop, british, and indie rock. • In IBP, latent features are independent across object instances. • Dependency between latent factors: co-occurrence of some features. e.g. ‘alternative’ + ’indie’ > ‘alternative’ + ‘classical’ • Extend infinite latent factor models to two unbounded layers of factors. • ‘Upper-layer factors express correlations between lower-layer factors via a noisy-or mechanism.’
Latent Factor Modeling • objects model parameters binary feature variables • Features: active inactive • Model is summarized as • are mutually independent.
Latent Factor Modeling • As , IBP gets the distribution of an unbounded binary feature matrix by marginalizing out • Stick-breaking construction for the IBP • Factor probabilities are expressed as:
A Hierarchy of Latent Features Via a Noisy-OR Mechanism Extent to two layers of binary latent features: • an upper-layer binary latent feature matrix with elements • an lower-layer binary latent feature matrix with elements • The weight matrix connect every element of to every element of , where • The active can be interpreted as ‘the possible causes of the activation of the individual
A Hierarchy of Latent Features Via a Noisy-OR Mechanism • Define an additional random matrix with inactive upper-layer features are failures active upper-layer features are failures • For each if all trials with trial Success No further trials Failure Move on to Trial
A Hierarchy of Latent Features Via a Noisy-OR Mechanism • Posterior distributions for the model parameters and : number of times is active : number of times that the j-th trial was a success for : number of times that the j-th trial was a failure for despite being active • Integrate out
Inference • Based on the blocked Gibbs sampling and the IBP semi-ordered slice sampler • Semi-ordered slice sampling of the upper-layer IBP • Semi-ordered slice sampling of the lower-layer factor model • Efficient extended blocked Gibbs sampler over the entire model without approximation
Experiments (I) • MNIST dataset • 1000 examples of images of the digit 3, preprocessed by projecting onto the first 64 PCA components • Set 500 examples as training and the left 500 as testing • Each data object is modeled as • Add random noise (std = 0.5) on the post-processed test set • Recover the noise-free version
Experiments (II) • Music Tags • Tags and tag frequencies are extracted from the social music website (http://www.last.fm/) using the Audioscrobbler web service • Dataset: 1000 artists with a vocabulary size of 100 tags representing a total 312134 counts. • Goal: to reduce the noisy collection of tags to a sparse representation for each artist; • Model the data as where C is the limit on the number of possible counts achievable, C=100
Experiments (II) • Both layers are sparse • Most features at the upper layer use one to three tags • Most features at the lower layer cover a broader range of tags Tags with the two most probable factors at the upper layer:
Experiments (II) • Comparison among Generalized linear model, IBP and two-layer Noisy-Or IFM • Test data contains 600 artist-tag collections, and 90% of the tags are missing; To impute the missing data from the left 10%. • For generalized linear model • Both IBP and noisy-or models perform better than the generalized latent linear model
Conclusions • Bayesian nonparametric version of the noisy-or mechanism • Extend infinite latent factor models to two or more unbounded layers of factors • Efficient inference via Gibbs sampling procedure • Compare performance with the standard IBP construction